Blog

HOME
Blog
Simulator
Vortex: OpenCL Compatible RISC-V Based GPGPU (Part 1)

2023-01-14 / Last updated : 2023-03-11 admin Simulator

Vortex: OpenCL Compatible RISC-V Based GPGPU (Part 1)

This article introduces an overview of Vortex, an open source RISC-V based GPGPU, and how to run the OpenCL program using the Vortex simulator.

Vortex

Vortex is a single instruction, multiple threads (SIMT) execution model GPGPU processor that adds custom instructions for GPGPU to RISC-V ISA. The README.md of the Vortex repository has the following description as specifications.

Specifications

Support RISC-V RV32IMF ISA
Performance:
- 1024 total threads running at 250 MHz
- 128 Gflops of compute bandwidth
- 16 GB/s of memory bandwidth
Scalability: up to 64 cores with optional L2 and L3 caches
Software: OpenCL 1.2 Support
Supported FPGAs:
- Intel Arria 10
- Intel Stratix 10

Microarchitecture

The docs/microarchitecture.md of the Vortex repository introduces the following diagram as a microarchitecture. The upper part of the figure below represents the Vortex core. Features for threads and warps are added to each stage. The GPGPU unit in the Execute stage handles GPGPU instructions.

A group of Vortex cores is a Vortex cluster, and a group of Vortex clusters is a Vortex processor. Vortex cores and Vortex clusters can share L2 and L3 caches, respectively.

Vortex Simulation Methods

Vortex simulation methods include vlsim and rtlsim for RTL simulation using Verilator, simx for cycle-approximate simulation, and fpga for FPGA simulation using FPGA board.

The Vortex simulation run is integrated into the shell script blackbox.sh in the ci directory. The above four simulation methods can be switched using command line arguments.

Similarly, using command line arguments of blackbox.sh, you can change the configuration of the Vortex processor: number of clusters, number of cores, number of warps, number of threads, enable/disable of L2 and L3 cache. The default configuration is clusters: 1, cores: 4, warps: 4, threads: 4, L2 and L3 caches: disabled.

Running `sgemm` on Vortex RTL Simulator

We ran the OpenCL program sgemm in the tests/opencl directory with different configurations of Vortex processor. The sgemm is a simplified version of single-precision GEMM (GEneral Matrix-to-matrix Multiply).

$ cd $VORTEX
$ ./ci/blackbox.sh --driver=rtlsim --cores=[1|2|4|8] [--l2cache] \
  --app=sgemm --args="-n[4|8|16|32|64|128]"

The featured image shows performance (FLOP/cycle) calculated from the simulation results.

Summary

This article introduces an overview of Vortex, an open source RISC-V based GPGPU, and how to run the OpenCL program sgemm in the tests/opencl directory using the Vortex simulator.

Categories: Simulator

Tags: GPGPU Multi-core RISC-V Verilator

Summary

2022-12-31

FPGA

2023-01-28

Vortex: OpenCL Compatible RISC-V Based GPGPU (Part 1)

Vortex

Specifications

Microarchitecture

Vortex Simulation Methods

Running `sgemm` on Vortex RTL Simulator

Summary

Luffca 2022 Wrap-Up

TFLite Micro on RISC-V Out-of-Order Core with Custom Instructions

Vortex

Specifications

Microarchitecture

Vortex Simulation Methods

Running sgemm on Vortex RTL Simulator

Summary

Luffca 2022 Wrap-Up

TFLite Micro on RISC-V Out-of-Order Core with Custom Instructions

Running `sgemm` on Vortex RTL Simulator