Assembly Instruction Guide

Detailed Instruction RTL List


Overview

Designing a compact and efficient INT8 Vector Processing Unit (VPU) on the DE10-Lite MAX10 FPGA.

The NPU is based on an 8×8 systolic array to accelerate matrix multiplication, the core of dense layers and convolutions in neural networks.

Target Applications:


System Architecture Overview

Top-Level Blocks

Block Function
Systolic Array Performs pipelined matrix multiplication (MAC grid)
Micro-FSM (Controller) Controls instruction flow: load, compute, reset
Dual-Port RAM Buffers matrices A, B and stores result
UART Interface Enables host-device communication for data transfer
Memory Map Standardized layout for matrix storage

Data Flow in Systolic Array

Wavefront Pattern: Diagonal scheduling enables pipelined, conflict-free compute


Processing Element (PE)

Each PE (Processing Element):