Designing a compact and efficient INT8 Vector Processing Unit (VPU) on the DE10-Lite MAX10 FPGA.
The NPU is based on an 8×8 systolic array to accelerate matrix multiplication, the core of dense layers and convolutions in neural networks.
Target Applications:
- CNN Inference (Edge AI)
- Digital Signal Processing
- Lightweight ML models on custom hardware
Block | Function |
---|---|
Systolic Array | Performs pipelined matrix multiplication (MAC grid) |
Micro-FSM (Controller) | Controls instruction flow: load, compute, reset |
Dual-Port RAM | Buffers matrices A, B and stores result |
UART Interface | Enables host-device communication for data transfer |
Memory Map | Standardized layout for matrix storage |
Wavefront Pattern: Diagonal scheduling enables pipelined, conflict-free compute
Each PE (Processing Element):