• Designed and implemented a pipelined Matrix-Vector Multiplication (MVM) engine in SystemVerilog, inspired by Microsoft Brainwave’s deep learning inference accelerator.
• Optimized for throughput and latency using DSP48e1 slices, capable of calculating 27 outputs in parallel at 280 MHz.
• Built a fully pipelined hyperbolic tangent (tanh) approximation unit based on a Taylor series approximation for nonlinear activation. Improved performance from 170 MHz to 320 MHz.