Vector Processing Unit (VPU) — FPGA Implementation

Overview

Designing a compact and efficient INT8 Vector Processing Unit (VPU) on the DE10-Lite MAX10 FPGA.

The NPU is based on an 8×8 systolic array to accelerate matrix multiplication, the core of dense layers and convolutions in neural networks.

Target Applications:

CNN Inference (Edge AI)

Digital Signal Processing

Lightweight ML models on custom hardware

Block	Function
Systolic Array	Performs pipelined matrix multiplication (MAC grid)
Micro-FSM (Controller)	Controls instruction flow: load, compute, reset
Dual-Port RAM	Buffers matrices A, B and stores result
UART Interface	Enables host-device communication for data transfer
Memory Map	Standardized layout for matrix storage

Wavefront Pattern: Diagonal scheduling enables pipelined, conflict-free compute

Each PE (Processing Element):