XNOR-Popcount, an Alternative Solution to the Accumulation Multiplication Method for Approximate Computations, to Improve Latency and Power Efficiency
Corressponding author's email:
khoapv@hcmute.edu.vnDOI:
https://doi.org/10.54644/jte.2025.1537Keywords:
Multiply–accumulate operation, XNOR-popcount, Adder, Latency, Power consumptionAbstract
Convolutional operations on neural networks are computationally intensive tasks that require significant processing time due to their reliance on calculations from multiplication circuits. In binarized neural networks, XNOR-popcount is a hardware solution designed to replace the conventional multiplied accumulator (MAC) method, which uses complex multipliers. XNOR-popcount helps optimize design area, reduce power consumption, and increase processing speed. This study implements and evaluates the performance of the XNOR-popcount design at the transistor-level on the Cadence circuit design software using 90nm CMOS technology. Based on the simulation results, for the same computational function, if MAC operation uses XNOR-popcount, the power consumption, processing time, and design complexity can be maximally reduced by up to 69%, 50%, and 48% respectively when compared to the method using conventional multipliers. Thus, the XNOR-popcount design is a useful method to apply to edge-computing platforms with minimalist hardware design, small memory space, and limited power supply.
Downloads: 0
References
E. Nurvitadhi et al., "Can FPGAs beat GPUs in accelerating next-generation deep neural networks?" in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5-14. DOI: https://doi.org/10.1145/3020078.3021740
J. Chen and X. Ran. "Deep learning with edge computing: A review," Proceedings of the IEEE, vol. 107, no. 8, 1655-1674, 2019. DOI: https://doi.org/10.1109/JPROC.2019.2921977
Y. L. Cun, Y. Bengio, and G. Hinton. "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015. DOI: https://doi.org/10.1038/nature14539
M. Horowitz, "1.1 computing's energy problem (and what we can do about it)," in IEEE international solid-state circuits conference digest of technical papers (ISSCC), 2014, pp. 10-14. DOI: https://doi.org/10.1109/ISSCC.2014.6757323
L. Lai, N. Suda, and V. Chandra. "Deep convolutional neural network inference with floating-point weights and fixed-point activations," arXiv preprint arXiv:1703.03073, 2017.
J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, "Quantized convolutional neural networks for mobile devices," in IEEE conference on computer vision and pattern recognition, 2016, pp. 4820-4828. DOI: https://doi.org/10.1109/CVPR.2016.521
K. Hwang, and W. Sung. "Fixed-point feedforward deep neural network design using weights+ 1, 0, and− 1," in IEEE Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6. DOI: https://doi.org/10.1109/SiPS.2014.6986082
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," European conference on computer vision, pp. 525-542, 2016. DOI: https://doi.org/10.1007/978-3-319-46493-0_32
M. Courbariaux, I. Hubara, D. Soudry, R. E. Yaniv, and Y. Bengio. "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1," arXiv preprint arXiv:1602.02830, 2016.
K. A. Asha and K. D. Shinde, "Performance analysis and implementation of array multiplier using various full adder designs for DSP applications: A VLSI based approach," Intelligent Systems Technologies and Applications, pp. 731-742, 2016. DOI: https://doi.org/10.1007/978-3-319-47952-1_59
A. Nigam and R. Singh, "Comparative Analysis of 28T Full adder with 14T Full adder using 180nm," International Journal of Engineering Science Advance Research, vol. 2, no. 1, pp. 27-32, 2016.
S. Pandey, A. A. Khan, and R. Sarma. "Comparative analysis of carry select adder using 8T and 10T full adder cells," in International Conference on Communication and Signal Processing, 2014, pp. 985-989. DOI: https://doi.org/10.1109/ICCSP.2014.6949993
H. Naseri and S. Timarchi, "Low-power and fast full adder by exploring new XOR and XNOR gates," Transactions on very large scale integration (VLSI) systems, vol. 26, no. 8, pp. 1481-1493, 2018. DOI: https://doi.org/10.1109/TVLSI.2018.2820999
S. Vaidya and D. Dandekar. "Delay-power performance comparison of multipliers in VLSI circuit design," International Journal of Computer Networks & Communications (IJCNC), vol. 2, no. 4, pp. 47-56, 2010. DOI: https://doi.org/10.5121/ijcnc.2010.2405
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2025 Journal of Technical Education Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © JTE.


