OPTIMIZATION AND PERFORMANCE ANALYSIS OF THE 30-BIT FIXED-POINT DIGITAL FORMAT

Milan Dinčić, Zoran Perić, Dragan Denić

DOI Number
https://doi.org/10.22190/FUACR220626008D
First page
095
Last page
105

Abstract


The 32-bit floating-point format (FP32) is standardly used for digital representation of data in computers, providing high quality of digital representation in a very wide dynamic range of data. However, the FP32 format has a very high computational complexity, requiring the use of expensive and powerful hardware, as well as high energy consumption. Hence, the implementation of the FP32 format on devices such as smart sensors, embedded and edge devices that have limited hardware resources becomes very problematic. On the other hand, the fixed-point format has significantly less computational complexity, consumes less power, requires less area on chip and provides faster calculations than the floating-point format, being much more suitable for implementation on devices with limited hardware resources.

The main goal of this paper is to find a fixed-point format that will be a good replacement for the FP32 format, in the sense that it provides the same performance as the FP32 format and at the same time significantly reduces the computational complexity. Therefore, the paper considers the 30-bit fixed-point format, optimizes the value of its parameters and evaluates its performance, using the analogy between the fixed-point digital representation and uniform quantization. As the main result, the paper shows that the 30-bit fixed-point format can achieve a better quality (i.e. higher SQNR) of digital representation for 3.352 dB compared to the FP32 format, saving at the same time 2 bits per each piece of data (which can be a significant saving for a large amount of data) and significantly reducing the complexity of the implementation. Therefore, the proposed 30-bit fixed-point format can be successfully used as a replacement for the FP32 format on devices with limited resources.

Keywords

Fixed-point digital format, floating-point digital format, uniform quantization, piecewise uniform quantization, smart sensors, resource-constrained devices

Full Text:

PDF

References


IEEE Standard for Floating-Point Arithmetic IEEE 754-2019, https://standards.ieee.org/ieee/754/6210/.

D. Zoni, A. Galimberti and W. Fornaciari, “An FPU design template to optimize the accuracy-efficiency-area trade-off”, Sustainable Computing: Informatics and Systems, vol. 29, part A, March 2021, doi: 10.1016/j.suscom.2020.100450.

G. Tagliavini, S. Mach, D. Rossi, A. Marongiu and L. Benini, “A transprecision floating-point platform for ultra-low power computing”, 2018 Design, Automation Test in Europe Conference Exhibition (DATE), March 2018, pp. 1051–1056.

D. Cattaneo, A. Di Bello, S. Cherubin, F. Terraneo and G. Agosta, “Embedded Operating System Optimization through Floating to Fixed Point Compiler Transformation”, 2018 21st Euromicro Conference on Digital System Design (DSD), Prague, Czech Republic, 29-31 August 2018, doi: 10.1109/DSD.2018.00042.

MathWorks. Benefits of fixed-point hardware. [Online]. Available: https://de.mathworks.com/help/fixedpoint/gs/benefitsof-fixed-point-hardware.html.

NI. (2019) Advantages of fixed-point numbers on hardware. [Online]. Available: https://www.ni.com/documentation/en/labview/latest/datatypes/advantages-fixed-point-numbers/.

R. T. Syed, M. Ulbricht, K. Piotrowski, and M. Krstic, “Fault Resilience Analysis of Quantized Deep Neural Networks”, 2021 IEEE 32nd International Conference on Microelectronics (MIEL), Niš, Serbia, September 12-14, 2021, pp. 275-279, doi: 10.1109/MIEL52794.2021.9569094.

A. Zhang, Z. –C. Lipton, M. Li and A. -J. Smola, “Dive into Deep Learning”, Amazon Science, (2020).

Z. Peric, M. Savic, M. Dincic, N. Vucic, D. Djosic and S. Milosavljevic, “Floating point and fixed point 32-bits quantizers for quantization of weights of neural networks”, 12th International Symposium on Advanced Topics in Electrical Engineering (ATEE), Bucharest, Romania, March 2021, pp. 1-4, doi: 10.1109/ATEE52255.2021.9425265.

Z. Perić, A. Jovanović, M. Dinčić, M. Savić, N. Vučić and A. Nikolić, “Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source”, 2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS), Niš, Serbia, 20-22 October 2021, doi: 10.1109/TELSIKS52058.2021.9606251.

N. C. Jayant and P. Noll, “Digital Coding of Waveforms: Principles and Applications to Speech and Video”, Prentice Hall, (1984).

J. Nikolić, D. Aleksić, Z. Perić and M. Dinčić, “Iterative Algorithm for Parameterization of Two-Region Piecewise Uniform Quantizer for the Laplacian Source”, Mathematics, vol. 9, no. 23: 3091, 2021, doi: 10.3390/math9233091.




DOI: https://doi.org/10.22190/FUACR220626008D

Refbacks

  • There are currently no refbacks.


Print ISSN: 1820-6417
Online ISSN: 1820-6425