Skip to content

riscv64/mmu: cache identity effective-address state#1007

Open
xiaokamikami wants to merge 1 commit into
masterfrom
pr1003/ea-identity-cache
Open

riscv64/mmu: cache identity effective-address state#1007
xiaokamikami wants to merge 1 commit into
masterfrom
pr1003/ea-identity-cache

Conversation

@xiaokamikami
Copy link
Copy Markdown
Member

Before this change, get_effective_address() re-derived the same common-case identity mapping from privilege and CSR state on every guest memory access. Most of those inputs only change at MMU or privilege state transitions.

After this change, the MMU caches whether data effective-address translation is currently an identity mapping and refreshes that cache only when the relevant state changes. The hot path can now return after a cached boolean check.

On linux/coremark-pro with the xs quick-profile flow, this change improved simulation frequency from 532,225,645 instr/s to 550,262,185 instr/s (+3.39%).

Workload: ./build/riscv64-nemu-interpreter -b ./workloads/linux/coremark-pro/fw_payload.bin

@github-actions
Copy link
Copy Markdown

NEMU Performance Results

Test Instructions Executed Estimated Host Throughput (instr/s) Actual NEMU Throughput (instr/s)
bitmanip.bin 5.157e+07 2.686e+07 2.826e+07
coremark-riscv64-xs-rv64gc-o2.bin 1.690e+08 1.985e+08 1.820e+08
coremark-riscv64-xs-rv64gc-o3.bin 1.685e+08 2.015e+08 1.936e+08
coremark-riscv64-xs-rv64gcb-o3.bin 1.656e+08 1.832e+08 1.778e+08
amtest-riscv64-xs.bin 8.668e+06 1.831e+07 1.646e+07
aliastest-riscv64-xs.bin 7.700e+06 1.787e+06 3.024e+06
softprefetchtest-riscv64-xs.bin 7.748e+06 3.411e+06 5.416e+06
zacas-riscv64-xs.bin 1.211e+07 5.345e+07 1.909e+07
linux-hello 1.841e+10 4.097e+07 4.801e+07
  • Host throughput is estimated based on 4GHz CPU frequency and IPC=2.5.
  • Actual throughput may vary based on the host CPU performance.

@xiaokamikami xiaokamikami force-pushed the pr1003/ea-identity-cache branch from 041f529 to a879a9e Compare April 27, 2026 07:05
@github-actions
Copy link
Copy Markdown

NEMU Performance Results

Test Instructions Executed Estimated Host Throughput (instr/s) Actual NEMU Throughput (instr/s) Master Instructions Executed Master Actual NEMU Throughput (instr/s) Change vs Master (Instructions) Change vs Master (Throughput)
bitmanip.bin 5.157e+07 2.686e+07 3.075e+07 5.157e+07 3.029e+07 -0.00% +1.53%
coremark-riscv64-xs-rv64gc-o2.bin 1.690e+08 1.985e+08 1.823e+08 1.731e+08 1.912e+08 +2.41% -4.65%
coremark-riscv64-xs-rv64gc-o3.bin 1.685e+08 2.015e+08 1.904e+08 1.726e+08 1.877e+08 +2.40% +1.40%
coremark-riscv64-xs-rv64gcb-o3.bin 1.656e+08 1.832e+08 1.775e+08 1.698e+08 1.715e+08 +2.44% +3.51%
amtest-riscv64-xs.bin 8.670e+06 1.831e+07 1.522e+07 8.691e+06 1.680e+07 +0.24% -9.40%
aliastest-riscv64-xs.bin 7.700e+06 1.787e+06 4.287e+06 7.704e+06 6.062e+06 +0.05% -29.28%
softprefetchtest-riscv64-xs.bin 7.748e+06 3.411e+06 1.070e+07 7.749e+06 1.335e+07 +0.01% -19.84%
zacas-riscv64-xs.bin 1.210e+07 5.346e+07 2.077e+07 1.218e+07 1.989e+07 +0.60% +4.43%
linux-hello 1.841e+10 4.097e+07 4.820e+07 1.855e+10 4.780e+07 +0.75% +0.83%
  • Instructions Executed is the current branch host instruction count measured by DynamoRIO.
  • Master Instructions Executed is the PR base commit baseline host instruction count measured by DynamoRIO.
  • Actual NEMU Throughput and Master Actual NEMU Throughput are single-run native measurements.
  • Host throughput is estimated based on 4GHz CPU frequency and IPC=2.5.
  • Change vs master (Instructions) is computed from host instruction count; positive means fewer host instructions than master.
  • Change vs master (Throughput) is computed from native throughput; positive means faster than master.
  • Master columns are populated on pull_request runs when the PR base commit baseline is built.
  • Actual throughput may vary based on the host CPU performance.

Before this change, get_effective_address() re-derived the same common-case identity mapping from privilege and CSR state on every guest memory access. Most of those inputs only change at MMU or privilege state transitions.

After this change, the MMU caches whether data effective-address translation is currently an identity mapping and refreshes that cache only when the relevant state changes. The hot path can now return after a cached boolean check.

On linux/coremark-pro with the xs quick-profile flow, this change improved simulation frequency from 532,225,645 instr/s to 550,262,185 instr/s (+3.39%).

Workload: ./build/riscv64-nemu-interpreter -b ./workloads/linux/coremark-pro/fw_payload.bin

Co-authored-by: Yinan Xu <xuyinan@ict.ac.cn>
@xiaokamikami xiaokamikami force-pushed the pr1003/ea-identity-cache branch from a879a9e to d8e47c2 Compare May 8, 2026 01:43
@xiaokamikami xiaokamikami requested a review from poemonsense as a code owner May 8, 2026 01:43
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

NEMU Performance Results - XS Interpreter

Test Guest Instructions Host Instructions Estimated Host Throughput (instr/s) Actual NEMU Throughput (instr/s) Baseline Host Instructions Baseline Actual NEMU Throughput (instr/s) Change vs Baseline (Instructions) Change vs Baseline (Throughput)
bitmanip.bin 1.385e+05 5.142e+07 2.694e+07 3.681e+07 5.142e+07 4.183e+07 +0.00% -12.01%
coremark-riscv64-xs-rv64gc-o2.bin 3.354e+06 1.697e+08 1.977e+08 2.298e+08 1.738e+08 2.332e+08 +2.40% -1.45%
coremark-riscv64-xs-rv64gc-o3.bin 3.394e+06 1.691e+08 2.007e+08 2.354e+08 1.733e+08 2.371e+08 +2.39% -0.72%
coremark-riscv64-xs-rv64gcb-o3.bin 3.035e+06 1.663e+08 1.825e+08 2.236e+08 1.704e+08 2.144e+08 +2.43% +4.32%
amtest-riscv64-xs.bin 1.587e+04 8.672e+06 1.830e+07 1.800e+07 8.691e+06 2.113e+07 +0.22% -14.85%
aliastest-riscv64-xs.bin 1.376e+03 7.696e+06 1.788e+06 4.932e+06 7.697e+06 3.965e+06 +0.01% +24.37%
softprefetchtest-riscv64-xs.bin 2.643e+03 7.747e+06 3.412e+06 1.218e+07 7.746e+06 7.342e+06 -0.02% +65.90%
zacas-riscv64-xs.bin 6.470e+04 1.211e+07 5.342e+07 2.373e+07 1.218e+07 2.120e+07 +0.58% +11.92%
linux-hello 7.545e+07 1.843e+10 4.094e+07 6.113e+07 1.857e+10 6.392e+07 +0.74% -4.36%

NEMU Performance Results - XS Ref Shared Object

Test Guest Instructions Host Instructions Estimated Host Throughput (instr/s) Actual NEMU Throughput (instr/s) Baseline Host Instructions Baseline Actual NEMU Throughput (instr/s) Change vs Baseline (Instructions) Change vs Baseline (Throughput)
bitmanip.bin 1.385e+05 1.057e+09 1.311e+06 9.034e+05 1.057e+09 9.589e+05 +0.00% -5.79%
coremark-riscv64-xs-rv64gc-o2.bin 3.354e+06 1.131e+10 2.967e+06 2.170e+06 1.130e+10 3.341e+06 -0.01% -35.07%
coremark-riscv64-xs-rv64gc-o3.bin 3.394e+06 1.272e+10 2.669e+06 2.079e+06 1.272e+10 3.000e+06 -0.01% -30.71%
coremark-riscv64-xs-rv64gcb-o3.bin 3.035e+06 1.051e+10 2.888e+06 2.328e+06 1.051e+10 3.227e+06 -0.01% -27.87%
amtest-riscv64-xs.bin 1.588e+04 6.751e+07 2.352e+06 1.268e+06 6.750e+07 1.191e+06 -0.00% +6.44%
aliastest-riscv64-xs.bin 1.379e+03 8.865e+06 1.556e+06 6.737e+05 8.865e+06 6.282e+05 -0.00% +7.23%
softprefetchtest-riscv64-xs.bin 2.646e+03 1.083e+07 2.443e+06 1.106e+06 1.083e+07 1.124e+06 +0.00% -1.55%
zacas-riscv64-xs.bin 6.471e+04 2.672e+08 2.421e+06 1.228e+06 2.672e+08 1.442e+06 -0.01% -14.84%
linux-hello 7.522e+07 1.029e+12 7.311e+05 8.686e+05 1.029e+12 8.917e+05 -0.00% -2.59%
  • Host Instructions is measured by DynamoRIO's inscount client.
  • Estimated Host Throughput assumes a fixed 4GHz CPU and IPC=2.5.
  • Actual NEMU Throughput is a single native NEMU run and may vary with host CPU performance.
  • Baseline columns are populated on pull_request runs when the PR base contains the same defconfig.
  • Change vs Baseline (Instructions) is computed from host instruction count; positive means fewer host instructions than baseline.
  • Change vs Baseline (Throughput) is computed from native throughput; positive means faster than baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants