Post

Notes on nju-pa

Updating

Notes on nju-pa

pa1: Perform value-based checks for macros, using a macro

If we want to check whether a macro exists, we can simply use #ifdef. Here, we want to test whether a macro exists and has a specific value using a function-like macro.

Let’s design a macro with the return value of 0 or 1:

1
2
// test if a boolean macro is defined
ISDEF(macro)

If we have:

1
2
3
4
5
6
#define FOO x
if(ISDEF(FOO)) {
	printf("FOO is a boolean value!");
}else { // otherwise
    printf("FOO is either undefined or not a boolean value");
}

The definition of ISDEF will be:

1
2
3
4
5
6
7
8
9
10
#define __P_DEF_0 X,
#define __P_DEF_1 X,

#define concat_temp(x, y) x ## y
#define concat(x, y) concat_temp(x, y)

#define choose_2nd(a, b, ...) b
#define MUX_TMP(p, a, b) choose_2nd(p a, b)
#define MUXDEF(macro, a, b) MUX_TMP(concat(__P_DEF_, macro), a, b)
#define ISDEF(macro) MUXDEF(macro, 1, 0)

The prerequisites for understanding the snippet are some rules as per the C standard:

  1. Arguments passed to the “function” (function-like macro) are expanded first if the argument is a macro at the call site.
  2. However, arguments will not be expanded if they are the operand of # (stringizing) or ## (token-pasting).

A macro for conditional compilation can also be implemented — “if a certain macro is defined, then keep a certain piece of code”:

1
2
3
#define __IGNORE(...)
#define __KEEP(...) __VA_ARGS__
#define IFDEF(macro, ...) MUXDEF(macro, __KEEP, __IGNORE)(__VA_ARGS__)

Note that __KEEP and __IGNORE won’t get expanded during the invocation of MUXDEF unless there are arguments passed to __KEEP and __IGNORE.

pa1: The CPU execution process

The core CPU execution process in NEMU is fetch, decode, and execute.

image-20250806164216764

This consequently involves two structs: Decode and CPU_state:

1
2
3
4
5
6
7
8
// 'nemu/include/cpu/decode.h'
typedef struct Decode {
  vaddr_t pc;
  vaddr_t snpc; // static next pc
  vaddr_t dnpc; // dynamic next pc
  ISADecodeInfo isa;
  IFDEF(CONFIG_ITRACE, char logbuf[128]);
} Decode;
1
2
3
4
5
6
7
8
// '/nemu/src/isa/riscv32/include/isa-def.h'
typedef struct {
  word_t gpr[MUXDEF(CONFIG_RVE, 16, 32)]; // general purpose registers
  vaddr_t pc;
} MUXDEF(CONFIG_RV64, riscv64_CPU_state, riscv32_CPU_state);

// '/nemu/include/isa.h'
typedef concat(__GUEST_ISA__, _CPU_state) CPU_state;
  • Decode is the temporary context for each instruction; it defines snpc as the static next PC (pc + 4) and dnpc as the actual address of the next instruction—mainly due to branches and jumps.
  • CPU_state is the overall architectural state that gets changed after each call to exec_once().

pa1: Recursively evaluate an expression

This is a familiar topic: the key idea of recursion is to divide a big problem into smaller ones and solve them. The following steps are involved when orchestrating a recursive function:

  1. Tackle the error cases

  2. Tackle the basic problem cases

  3. Tackle the non-basic problem cases; this is commonly done by calling the function itself to perform the division of the problem.

Put succinctly, the fundamental role of a recursive function is to handle different cases.

The cleverness of check_parentheses

1
2
3
4
5
6
7
8
9
10
static bool check_parentheses(int p, int q) {
  if (tokens[p].type != '(' || tokens[q].type != ')') return false;
  int balance = 0;
  for (int i = p + 1; i < q; i++) {
    if (tokens[i].type == '(') balance++;
    else if (tokens[i].type == ')') balance--;
    if (balance < 0) return false;
  }
  return balance == 0;
}

We assume the first and last tokens form a matching pair and check only the interior tokens from p + 1 to q - 1. If they are balanced (i.e., first ( then )), the assumption holds and the function returns true. If they are imbalanced (i.e., first ) then (), for example in (4 + 3) * (2 - 1), the function returns false. We therefore maintain a balance variable to implement the checking function. In the balanced case, balance will always be non-negative thourgh out the scan.

Skip over parentheses

We should find the main operator (the operator with the lowest precedence) only outside the parentheses. Tokens inside () are not considered:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
static int find_main_op(int p, int q) {
  int main_op = -1;
  int min_precedence = 100;
  int balance = 0;
  // Traverse all the tokens
  for (int i = p; i <= q; i++) {
    // Jump through the parentheses
    if (tokens[i].type == '(') {
      balance++;
      continue;
    }
    if (tokens[i].type == ')') {
      balance--;
      continue;
    }
    if (balance != 0) continue;

    // Find the operator with the lowest precedence
    int precedence;
    switch(tokens[i].type) {
      case '+': case '-': precedence = 1; break;
      case '*': case '/': precedence = 2; break;
      default: continue;
    }

    if (precedence <= min_precedence) {
      min_precedence = precedence;
      main_op = i;
    }
  }
  return main_op;
}

Handle errors in the recursion of eval()

The eval() function takes the boundaries of the expression and returns the evaluation result, but what if an error occurs?

A success boolean variable is used as a global error flag. eval() passes its pointer on each recursive call:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
static word_t eval(int p, int q, bool *success){
    if( p > q ){ ... }
    else if ( p == q ) { ... }
    else if(check_parentheses(p, q) == true) { ... }
    else{
        int op = find_main_op(p, q);
        [ ... ]
        word_t val1 = eval(p, op - 1, success);
        if (!*success) return 0;
        word_t val2 = eval(op + 1, q, success);
        if (!*success) return 0;
        switch(tokens[op].type){
          case '+': return val1 + val2;
          case '-': return val1 - val2;
          case '*': return val1 * val2;
          [ ... ]
        }
    }
}

For instance, given the input 5 + (3 * zxcvbn):

1
2
3
4
5
6
eval(0, 6)  // "5 + (3 * zxcvbn)"
├── eval(0, 0)  // "5" → success = true, returns 5
└── eval(2, 6)  // "(3 * zxcvbn)"
    └── eval(3, 5)  // "3 * zxcvbn"
        ├── eval(3, 3)  // "3" → success = true, returns 3
        └── eval(5, 5)  // "zxcvbn" → success = false, returns 0

eval(3, 5) and eval(0, 6) terminate before completing evaluation. The error is successfully propagated to cmd_x:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
word_t expr(char *e, bool *success) {
  [ ... ]
  *success = true;
  return eval(0, nr_token - 1, success);
}

static int cmd_x(char *args){
  [ ... ]
  bool success = true;
  paddr_t addr = expr(arg2, &success);
  if (!success) {
    printf("Invalid expression for address.\n");
    return 0;
  }
  [ ... ]
}

Conclusion

Of the three subsections above, two of them showed how a meticulously defined variable balance determines matching parentheses, and the last illustrated an error‑handling method in recursive calls that avoids issues like silent failures, cascading errors, and unnecessary computation.

References

Utilities for debugging

1
2
3
4
5
6
7
8
// The upgraded version of `printf`
Log("Processing expression starting at position %d. Current substring: '%s'", position, e + position);
// The upgraded version of `assert`
Assert(len < TOKEN_BUF_SIZE, "Token string too long! Length: %d, Max: %d", len, TOKEN_BUF_SIZE - 1);
assert(len < TOKEN_BUF_SIZE);
// The upgraded version of `Assert` that implies this line of code should not be executed
panic("Unexpected token type %d in expression evaluation. This should not happen!", type);

This post is licensed under CC BY 4.0 by the author.