TPDS2020: "O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference"
Updated: Jul 12, 2020
Binarized Neural Networks (BNN), which significantly reduce computational complexity and memory demand, have shown potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching certain accuracy bars is sufficient and real-time is highly desired. In this work, we demonstrate that the highly-condensed BNN model can be shrunk significantly by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture, O3BNN-R, which can curtail edge evaluation in cases where the binary output of a neuron can be determined early at runtime during inference, is proposed. Similar to Instruction-Level-Parallelism (ILP), fine-grained, irregular, runtime pruning opportunities are traditionally presumed to be difficult to exploit. To further enhance the pruning opportunities, we conduct an algorithm-architecture co-design approach where we augment the loss function during the training stage with specialized regularization terms favoring the edge pruning. We evaluate our design on an embedded FPGA using networks, including VGG-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that O3BNN-R without regularization can prune, on average, 30\% of the operations, without any accuracy loss, bringing 2.2$\times$ inference-speedup, and on average 34$\times$ energy-efficiency improvement over state-of-the-art BNN implementations on FPGA/GPU/CPU. With regularization at training, the performance is further improved, on average, by 15\%.