TensorFlow Lite for Microcontrollers on RISC-V Out-of-Order Core

tflite-micro-naxriscv

We have successfully run Google’s TensorFlow Lite for Microcontrollers on an FPGA board implementing NaxRiscv, a RISC-V Out-of-Order core.

Click here for NaxRiscv related articles.

TensorFlow Lite for Microcontrollers

The TensorFlow Lite for Microcontrollers (hereafter, TFLite Micro) repository is described as follows.

TensorFlow Lite for Microcontrollers is a port of TensorFlow Lite designed to run machine learning models on DSPs, microcontrollers and other devices with limited memory.

Simply put, TFLite Micro is a bare-metal version of TensorFlow Lite that runs without an OS.

NaxRiscv

NaxRiscv is an out-of-order execution superscalar RISC-V core. NaxRiscv is integrated into LiteX, an SoC builder. For an overview of NaxRiscv, see the related article Benchmarks on RISC-V Out-of-Order Simulator.

This time, we used the 32-bit NaxRiscv gateware for Digilent’s Nexys Video introduced in the article Running 32-bit Linux on FPGAs with RISC-V Out-of-Order Core.

TFLite Micro on FPGA with NaxRiscv

mobilenetv2-golden-tests

Result of golden tests for MobileNetV2 model

We loaded the 32-bit NaxRiscv gateway into the FPGA board Nexys Video and ran the TFLite Micro’s Keyword Spotting, Person Detection and MobileNetV2 models.

The featured image and the table below show the results compared to VexRiscv, an in-order execution scalar RISC-V core.

ML models Mega cycles Speedup
factor
VexRiscv NaxRiscv
Keyword Spotting 87 33 2.64
Person Detection 215 79 2.72
MobileNetV2 1079 413 2.61

The speedup compared to VexRiscv is roughly 2.6x.

Summary

We have successfully run Google’s TFLite Micro on an FPGA board implementing NaxRiscv, an out-of-order execution superscalar RISC-V core. Compared to VexRiscv, an in-order execution scalar RISC-V core, the speedup is roughly 2.6x.