Benchmarks on RV64GC RISC-V Out-of-Order Simulator

benchmark-naxriscv-simulator

Since the RISC-V Out-of-Order core NaxRiscv now supports RV[32|64]GC, we have created an RV64GC simulator and ran the benchmarks CoreMark, Dhrystone, and Whetstone.

Click here for NaxRiscv related articles.

NaxRiscv

NaxRiscv is an out-of-order execution superscalar RISC-V core that supports RV[32|64]IMAFDCSU.

See the previous article for an overview of NaxRiscv and NaxRiscv documentation for more details.

The performance of the default RV32IMA and RV64IMA is as follows.

RV32IMA

  • CoreMark: 5.00 CoreMark/MHz (-O3 and so many more random flags)
  • Dhrystone: 2.94 DMIPS/MHz (-O3 -fno-common -fno-inline)

RV64IMA

  • CoreMark: 4.91 CoreMark/MHz (-O3, u32 as s32 and so many more random flags)
  • Dhrystone: 2.97 DMIPS/MHz (-O3 -fno-common -fno-inline)

Benchmarks on NaxRiscv RV64GC Simulator

This time we have created a NaxRiscv RV64GC (RV64IMAFDC) simulator using Verilator and ran the benchmarks CoreMark, Dhrystone, and Whetstone ported to NaxRiscv.

CoreMark

The following shows the console output when running CoreMark.

$ ./sim/VNaxRiscv64gc --name coremark \
  --load-elf $NAXSOFTWARE/baremetal/coremark/build/rv64imafdc/coremark.elf \
  --pass-symbol pass
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 2178284
Total time (secs): 2178284.000000
Iterations/Sec   : 0.000005
Iterations       : 10
Compiler version : GCC11.1.0
Compiler flags   : -DPERFORMANCE_RUN=1  -march=rv64imafdc -mabi=lp64d -mcmodel=medany -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -I../driver -O3 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-crossjumping -freorder-blocks-and-partition -DCORE_DEBUG=0  -lgcc -lc -nostartfiles -ffreestanding -Wl,-Bstatic,-T,../common/app.ld,-Map,coremark.map,--print-memory-usage
Memory location  : STACK
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xfcaf
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 0.000005 / GCC11.1.0 -DPERFORMANCE_RUN=1  -march=rv64imafdc -mabi=lp64d -mcmodel=medany -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -I../driver -O3 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-crossjumping -freorder-blocks-and-partition -DCORE_DEBUG=0  -lgcc -lc -nostartfiles -ffreestanding -Wl,-Bstatic,-T,../common/app.ld,-Map,coremark.map,--print-memory-usage / STACK
4.59 CoreMark/MHz
SUCCESS coremark

The CoreMark/MHz of the RV64GC simulator is 4.59. As mentioned above, RV64IMA has 4.91 CoreMark/MHz, so the score has dropped by about 7%.

Dhrystone

The following shows the console output when running Dhrystone.

$ ./sim/VNaxRiscv64gc --name dhrystone \
  --load-elf $NAXSOFTWARE/baremetal/dhrystone/build/rv64imafdc/dhrystone.elf \
  --pass-symbol pass

Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using time(), HZ=12000000

...

Microseconds for one run through Dhrystone: 16
Dhrystones per Second:                      62168
User_Time : 965124
Number_Of_Runs : 5000
HZ : 12000000
DMIPS per MHz:                              2.94
SUCCESS dhrystone

The DMIPS/MHz of the RV64GC simulator is 2.94. As mentioned above, RV64IMA has 2.97 DMIPS/MHz, so the score is almost the same.

Whetstone

The following shows the console output when running the newly ported Whetstone.

$ ./sim/VNaxRiscv64gc --name whetstone \
  --load-elf $NAXSOFTWARE/baremetal/whetstone/build/rv64imafdc/whetstone.elf \
  --pass-symbol pass

Loops: 10, Iterations: 1, Duration: 1023732 cycles.
C Converted Double Precision Whetstones: 976 KIPS/MHz
SUCCESS whetstone

The WMIPS/MHz of the RV64GC simulator is 0.976.

Summary

We have created a NaxRiscv RV64GC simulator using Verilator and ran the benchmarks CoreMark, Dhrystone, and Whetstone.

The performance of the RV64GC is as follows.

RV64GC

  • CoreMark: 4.59 CoreMark/MHz (-O3, u32 as s32 and so many more random flags)
  • Dhrystone: 2.94 DMIPS/MHz (-O3 -fno-common -fno-inline)
  • Whetstone: 0.976 WMIPS/MHz (-O3 -fno-common -fno-inline)