Benchmarks in LiteX/Rocket on FPGA boards

benchmark-linux-litex-rocket

We measured a performance of the multi-core 64-bit Rocket Chip SoCs introduced in the previous article using the benchmark CoreMark.
We ran CoreMark on two FPGA boards, a Qmtech Wukong board and a Digilent Nexys Video.
The SoCs for Wukong board and Nexys Video are dual-core and quad-core, respectively.
We also use Linux as the OS.

CoreMark on Wukong board

The SoC for Wukong board is a dual-core 64-bit Rocket Chip.
The table below shows the benchmark results.

No. of Thread CoreMark CoreMark/MHz Speedup factor
1 107.0 2.14 1.00
2 208.8 4.18 1.95

The CoreMark/MHz for single thread is 2.14.
Rocket CoreMark/MHz is known as 2.32, so this single-threaded result (OS: Linux, -O2 option) has 92% performance.
By the way, the program with the option changed from -O2 to -O3 -funroll-loops gives 95% performance.
Also, the CoreMark/MHz of 2 threads is 4.18.
Since the speedup factor of 2 threads is 1.95, it can be seen that the effect of dual-core is obtained.

The following shows the output when executing CoreMark with 2 threads.

2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 19151
Total time (secs): 19.151000
Iterations/Sec : 208.866378
Iterations : 4000
Compiler version : GCC10.2.0
Compiler flags : -O2 -DMULTITHREAD=2 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt
Parallel PThreads : 2
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[1]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[1]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[1]crcstate : 0x8e3a
[0]crcfinal : 0x4983
[1]crcfinal : 0x4983
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 208.866378 / GCC10.2.0 -O2 -DMULTITHREAD=2 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt / Heap / 2:PThreads

CoreMark on Nexys Video

The SoC for Nexys Video is a quad-core 64-bit Rocket Chip.
The featured image above and the table below show the benchmark results.

No. of Thread CoreMark CoreMark/MHz Speedup factor
1 106.2 2.12 1.00
2 210.1 4.20 1.98
3 311.6 6.23 2.94
4 415.4 8.31 3.92

The CoreMark/MHz for single thread is 2.12, which is about the same as that on the Wukong board.
Also, the CoreMark/MHz of 4 threads is 8.31.
Therefore, the speedup factor of 4 threads is 3.92.
From this, it seems that there is no element that becomes a bottleneck up to 4 threads.

The following shows the output when executing CoreMark with 4 threads.

2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 19258
Total time (secs): 19.258000
Iterations/Sec : 415.411777
Iterations : 8000
Compiler version : GCC10.2.0
Compiler flags : -O2 -DMULTITHREAD=4 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt
Parallel PThreads : 4
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[1]crclist : 0xe714
[2]crclist : 0xe714
[3]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[1]crcmatrix : 0x1fd7
[2]crcmatrix : 0x1fd7
[3]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[1]crcstate : 0x8e3a
[2]crcstate : 0x8e3a
[3]crcstate : 0x8e3a
[0]crcfinal : 0x4983
[1]crcfinal : 0x4983
[2]crcfinal : 0x4983
[3]crcfinal : 0x4983
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 415.411777 / GCC10.2.0 -O2 -DMULTITHREAD=4 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt / Heap / 4:PThreads

Summary

We measured the performance of multi-core 64-bit Rocket Chip SoCs introduced in the previous article using the benchmark CoreMark.
We found that the speedup factors for the dual-core SoC for Wukong board and the quad-core SoC for Nexys Video are 1.95 and 3.92, respectively.