Vortex: OpenCL Compatible RISC-V Based GPGPU (Part 1)

riscv-gpgpu-vortex-part1

This article provides an overview of Vortex, an open source RISC-V based GPGPU, and how to use the Vortex simulator to run the OpenCL program.

Vortex

Vortex is a single instruction, multiple threads (SIMT) execution model GPGPU processor that adds custom instructions for GPGPU to RISC-V ISA. The README.md of the Vortex repository has the following description as specifications.

Specifications

  • Support RISC-V RV32IMF ISA
  • Performance:
    • 1024 total threads running at 250 MHz
    • 128 Gflops of compute bandwidth
    • 16 GB/s of memory bandwidth
  • Scalability: up to 64 cores with optional L2 and L3 caches
  • Software: OpenCL 1.2 Support
  • Supported FPGAs:
    • Intel Arria 10
    • Intel Stratix 10

Microarchitecture

The docs/microarchitecture.md of the Vortex repository introduces the following diagram as a microarchitecture. The upper part of the figure below represents the Vortex core. Features for threads and warps are added to each stage. The GPGPU unit in the Execute stage handles GPGPU instructions.

vortex_microarchitecture_v2

A group of Vortex cores is a Vortex cluster, and a group of Vortex clusters is a Vortex processor. Vortex cores and Vortex clusters can share L2 and L3 caches, respectively.

Vortex Simulation Methods

Vortex simulation methods include vlsim and rtlsim for RTL simulation using Verilator, simx for cycle-approximate simulation, and fpga for FPGA simulation using FPGA board.

The Vortex simulation run is integrated into the shell script blackbox.sh in the ci folder. The above four simulation methods can be switched using command line arguments.

Similarly, using command line arguments of blackbox.sh, you can change the configuration of the Vortex processor: number of clusters, number of cores, number of warps, number of threads, enable/disable of L2 and L3 cache. The default configuration is clusters: 1, cores: 4, warps: 4, threads: 4, L2 and L3 caches: disabled.

Running sgemm on Vortex RTL Simulator

We ran the OpenCL program sgemm in the tests/opencl folder with different configurations of Vortex processor. The sgemm is a simplified version of single-precision GEMM (GEneral Matrix-to-matrix Multiply).

$ cd $VORTEX
$ ./ci/blackbox.sh --driver=rtlsim --cores=[1|2|4|8] [--l2cache] \
  --app=sgemm --args="-n[4|8|16|32|64|128]"

The featured image shows performance (FLOP/cycle) calculated from the simulation results.

Summary

This article provides an overview of Vortex, an open source RISC-V based GPGPU, and how to use the Vortex simulator to run the OpenCL program sgemm in the tests/opencl folder.