Benchmarks on RISC-V OoO Simulator

benchmark-naxriscv-simulator

We have created a simulator for NaxRiscv, an out-of-order (OoO) RISC-V CPU, and ran the benchmarks CoreMark and Dhrystone.

The current NaxRiscv repository is still WIP, but you can run Linux on the simulator. It is also integrated into LiteX and allows you to create gateware for FPGA boards.

The feature image is a visualization of the log output from the simulator with Konata, an instruction pipeline visualizer.

Click here for related articles.

Note: The content was updated on July 16, 2022.

NaxRiscv

NaxRiscv is a RISC-V CPU being developed by Charles Papon, the developer of 32-bit RISC-V VexRiscv. Like VexRiscv, it is written in a hardware description language called SpinalHDL.

We thought the difference between VexRiscv and NaxRiscv was an in-order scalar and an out-of-order superscalar, like Rocket and BOOM (Berkeley Out-of-Order Machine) at University of California, Berkeley (UCB). However, new attempts such as 64-bit (RV64) support, which VexRiscv did not have, are being made.

The performance of NaxRiscv’s default RV32IMA is as follows.

  • CoreMark: 5.00 CoreMark/MHz (-O3 and so many more random flags)
  • Dhrystone: 2.94 DMIPS/MHz (-O3 -fno-common -fno-inline)

Benchmarks on NaxRiscv Simulator using Verilator

The NaxRiscv repository on GitHub describes how to create an RV32IMA simulator using Verilator, and since CoreMark and Dhrystone are built, we performed a reproduction test.

CoreMark

The following shows the console output when running CoreMark.

$ ./sim/VNaxRiscv32ima --name coremark \
  --load-elf $NAXSOFTWARE/baremetal/coremark/build/rv32ima/coremark.elf \
  --pass-symbol pass
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 2001972
Total time (secs): 2001972.000000
Iterations/Sec   : 0.000005
Iterations       : 10
Compiler version : GCC11.1.0
Compiler flags   : -DPERFORMANCE_RUN=1  -march=rv32ima -mabi=ilp32 -mcmodel=medany -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -I../driver -O3 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-crossjumping -freorder-blocks-and-partition -DCORE_DEBUG=0  -lgcc -lc -nostartfiles -ffreestanding -Wl,-Bstatic,-T,../common/app.ld,-Map,coremark.map,--print-memory-usage
Memory location  : STACK
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xfcaf
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 0.000005 / GCC11.1.0 -DPERFORMANCE_RUN=1  -march=rv32ima -mabi=ilp32 -mcmodel=medany -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -I../driver -O3 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-crossjumping -freorder-blocks-and-partition -DCORE_DEBUG=0  -lgcc -lc -nostartfiles -ffreestanding -Wl,-Bstatic,-T,../common/app.ld,-Map,coremark.map,--print-memory-usage / STACK
5.00 Coremark/MHz
SUCCESS coremark

The CoreMark/MHz of the simulator is 5.00.

Dhrystone

The following shows the console output when running Dhrystone.

$ ./sim/VNaxRiscv32ima --name dhrystone \
  --load-elf $NAXSOFTWARE/baremetal/dhrystone/build/rv32ima/dhrystone.elf \
  --pass-symbol pass

Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using time(), HZ=12000000

...

Microseconds for one run through Dhrystone: 16
Dhrystones per Second:                      62169
User_Time : 965109
Number_Of_Runs : 5000
HZ : 12000000
DMIPS per Mhz:                              2.94
SUCCESS dhrystone

The DMIPS/MHz of the simulator is 2.94.

Summary

We have created a NaxRiscv simulator using Verilator and ran the benchmarks CoreMark and Dhrystone.