Running CoreMark on SonicBOOM Simulator

benchmark-boom-simulator

We have created a simulator with the short-forwards branch (SFB) optimization of SonicBOOM, an out-of-order execution superscalar RISC-V CPU, and ran CoreMark.
The results were CoreMark/MHz 6.89 and 6.45 with and without changing ee_u32 from unsigned int to signed int, respectively.
These results show that SonicBOOM can achieve the nominal value of 6.2 CoreMark/MHz.

BOOM

Berkeley Out-of-Order Machine (BOOM) is one of the RTL generators included in Chipyard introduced in the previous article, and can generate RISC-V out-of-order execution superscalar CPUs.
Currently, it is BOOM version3 (BOOMv3), also known as SonicBOOM.
The SonicBOOM nominal CoreMark/MHz is 6.2.

SFB optimization

The short-forwards branch (SFB) optimization is described in SonicBOOM: The 3rd Generation Berkeley Out-of-Order Machine as follows:

As an example, SonicBOOM achieves 6.15 CoreMark/MHz with the SFB optimization enabled, compared to 4.9 CoreMark/MHz without.

However, this SFB optimization is not enabled by default.

BOOM Simulator

The created BOOM simulators are the following two simulators that use Verilator.

  • simulator-chipyard-MegaBoomConfig: Default (SFB Optimization Disabled)
  • simulator-chipyard-MegaBoomConfig-SFB: SFB Optimization Enabled

There are several types of BOOM configurations. We have created MegaBoom simulators with a 4-wide BOOM configuration.

CoreMark

CoreMark is based on the riscv-coremark included in Chipyard.
As shown in the table below, we have built four types of CoreMark. They are a combination of two types of CFLAGS, -O2 and -O3, and two types of ee_u32, the default unsigned int and signed int.

ee_u32
unsigned int signed int
CFLAGS -O2 coremark.o2-u32 coremark.o2-s32
-O3 coremark.o3-u32 coremark.o3-s32

In addition, ITERATIONS of CoreMark is 10, and GCC 11.1.0 is used for building CoreMark.

Running CoreMark on BOOM Simulator

Default ee_u32

The table below shows the results using the default CoreMark, which uses unsigned int for ee_u32.

CoreMark/MHz SFB Optimization
Disabled Enabled
CFLAGS -O2 5.49 6.45
-O3 5.94 5.97

The BOOM simulator with SFB optimization achieves 6.45 CoreMark/MHz, which exceeds the nominal value of 6.2 CoreMark/MHz, when running the CoreMark built with -O2.
In contrast, the BOOM simulator without SFB optimization is 5.94 CoreMark/MHz when running the CoreMark built with -O3.

The following shows the output of the BOOM simulator with SFB optimization when running the CoreMark built with -O2.

$ ./simulator-chipyard-MegaBoomConfig-SFB coremark.o2-u32
This emulator compiled with JTAG Remote Bitbang client. To enable, use +jtag_rbb_enable=1.
Listening on port 37541
[UART] UART0 is here (stdin/stdout).
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 1551464
Total time (secs): 1551464
Iterations/Sec   : 0
Iterations       : 10
Compiler version : GCC11.1.0
Compiler flags   : -O2 -fno-builtin -mcmodel=medany -static -std=gnu99 -fno-common -nostdlib -nostartfiles -lm -lgcc -T ../riscv64-baremetal/link.ld   
Memory location  : Please put data memory location here
			(e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xfcaf
Correct operation validated. See README.md for run and reporting rules.
mcycle = 1589202
minstret = 3569373

Since 10 iterations of CoreMark are 1,551,464 cycles in the Total ticks column, CoreMark/MHz is 6.45, as shown in the table above.

Modified ee_u32

The table below shows the results using CoreMark, which uses signed int for ee_u32.

CoreMark/MHz SFB Optimization
Disabled Enabled
CFLAGS -O2 5.84 6.89
-O3 6.27 6.31

When running the CoreMark built with -O2, the CoreMark/MHz of the BOOM simulator with SFB optimization is 6.89, which exceeds the nominal value of 6.2.

The figure below shows the output of the BOOM simulator with SFB optimization when running the CoreMark built with -O2.

benchmark-boom-simulator-2
Since 10 iterations of CoreMark are 1,451,480 cycles in the Total ticks column, CoreMark/MHz is 6.89 as shown in the table above.

Summary

We have created a SonicBOOM simulator with short-forwards branch (SFB) optimization enabled and ran CoreMark.
The results were CoreMark/MHz 6.89 and 6.45 with and without changing ee_u32 from unsigned int to signed int, respectively.
These results show that SonicBOOM can achieve the nominal value of 6.2 CoreMark/MHz.