Benchmarks in LiteX/Rocket on FPGA boards

benchmark-linux-litex-rocket

We measured a performance of the multi-core 64-bit Rocket Chip SoCs introduced in the previous article using the benchmark CoreMark.
We ran CoreMark on two FPGA boards, a Qmtech Wukong board and a Digilent Nexys Video.
The SoCs for Wukong board and Nexys Video are dual-core and quad-core, respectively.
We also use Linux as the OS.

CoreMark on Wukong board

The SoC for Wukong board is a dual-core 64-bit Rocket Chip.
The table below shows the benchmark results.

No. of Thread CoreMark CoreMark/MHz CoreMark/MHz
/Thread
1 107.0 2.14 2.14
2 208.8 4.18 2.09

The CoreMark/MHz for single thread is 2.14.
Rocket CoreMark/MHz is known as 2.32, so this single-threaded result (OS: Linux, -O2 option) has 92% performance.
By the way, the program with the option changed from -O2 to -O3 -funroll-loops gives 95% performance.
Also, the CoreMark/MHz for 2 threads is 4.18.
The score of 2 threads compared to single thread is 1.95 times, which shows that the effect of dual-core is obtained.

The following shows the output when executing CoreMark with 2 threads.

2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 19151
Total time (secs): 19.151000
Iterations/Sec : 208.866378
Iterations : 4000
Compiler version : GCC10.2.0
Compiler flags : -O2 -DMULTITHREAD=2 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt
Parallel PThreads : 2
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[1]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[1]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[1]crcstate : 0x8e3a
[0]crcfinal : 0x4983
[1]crcfinal : 0x4983
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 208.866378 / GCC10.2.0 -O2 -DMULTITHREAD=2 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt / Heap / 2:PThreads

CoreMark on Nexys Video

The SoC for Nexys Video is a quad-core 64-bit Rocket Chip.
The featured image above and the table below show the benchmark results.

No. of Thread CoreMark CoreMark/MHz CoreMark/MHz
/Thread
1 106.2 2.12 2.12
2 210.1 4.20 2.10
3 311.6 6.23 2.08
4 415.4 8.31 2.08

The CoreMark/MHz for single thread is 2.12, which is about the same as that on the Wukong board.
Also, the CoreMark/MHz for 4 threads is 8.31.
The 4-thread score is 3.91 times higher than the single-threaded score.
In addition, the multi-thread CoreMark/MHz/Thread maintains 2.08 to 2.10, so it seems that there is no element that becomes a bottleneck up to 4 threads.

The following shows the output when executing CoreMark with 4 threads.

2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 19258
Total time (secs): 19.258000
Iterations/Sec : 415.411777
Iterations : 8000
Compiler version : GCC10.2.0
Compiler flags : -O2 -DMULTITHREAD=4 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt
Parallel PThreads : 4
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[1]crclist : 0xe714
[2]crclist : 0xe714
[3]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[1]crcmatrix : 0x1fd7
[2]crcmatrix : 0x1fd7
[3]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[1]crcstate : 0x8e3a
[2]crcstate : 0x8e3a
[3]crcstate : 0x8e3a
[0]crcfinal : 0x4983
[1]crcfinal : 0x4983
[2]crcfinal : 0x4983
[3]crcfinal : 0x4983
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 415.411777 / GCC10.2.0 -O2 -DMULTITHREAD=4 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt / Heap / 4:PThreads

Summary

We measured the performance of multi-core 64-bit Rocket Chip SoCs introduced in the previous article using the benchmark CoreMark.
We found that the dual-core SoC on Wukong board and the quad-core SoC on Nexys Video performed 1.95 times and 3.91 times the performance of a single thread, respectively.