# Design of an Efficient Rounding-Based Approximate Multiplier (RoBA)

M. Srinivasachary<sup>1</sup> P.Ashok<sup>2</sup> Dr.P. Bala murali Krishna<sup>3</sup> mschary90@gmail.com<sup>1</sup> ashokp25@gmail.com<sup>2</sup> pbmk05@gmail.com<sup>3</sup> <sup>1</sup>PG Scholar, VLSI & ES, Sri Mittapalli College of Engineering, Tummalapalem, Guntur, AP, india <sup>2</sup>Associate Professor, Dept of ECE, Sri Mittapalli College of Engineering, Tummalapalem, Guntur, AP, India

<sup>3</sup>Professor & HOD, Dept of ECE, Sri mittapalli College of Engineering, Tummalapalem, Guntur, AP, India

ABSTRACT: In this paper, we propose an approximate multiplier that is high speed yet energy efficient. The approach is to round the operands to the nearest exponent of two. This way the computational intensive part of the multiplication is omitted improving speed and energy consumption at the price of a small error. The proposed approach is applicable to both signed and unsigned multiplications. We propose implementations of three hardware the approximate multiplier that includes one for the unsigned and two for the signed operations. The efficiency of the proposed multiplier is evaluated by comparing its performance with those of some approximate and accurate multipliers using different design parameters. In addition, the efficacy of the proposed approximate multiplier is studied in two image processing applications, i.e., image sharpening and smoothing. For extension in the convolution process of the FIR Filter RoBA multiplier is used.

INDEX TERMS— Accuracy, approximate computing, energy efficient, error analysis, high speed, multiplier.

## 1. INTRODUCTION

Energy minimization is one of the main design requirements in almost any electronic systems, especially the portable ones such as smart phones, tablets, and different gadgets. It is highly desired to achieve this minimization with minimal performance (speed) penalty. Digital signal processing (DSP) blocks are key components of these portable devices for realizing various multimedia applications. The computational core of these blocks is the arithmetic logic unit where multiplications have the greatest share among all arithmetic operations performed in these DSP systems. Therefore, improving the speed and power/energy-efficiency characteristics of multipliers plays a key role in improving the efficiency of processors.

Many of the DSP cores implement image and video processing algorithms where final outputs are either images or videos prepared for human consumptions. This fact enables us to use approximations for improving the speed/energy efficiency. This originates from the limited perceptual abilities of human beings in observing an image or a video. In addition to the image and video processing applications, there are other areas where the exactness of the arithmetic operations is not critical to the functionality of the system. Being able to use the approximate computing provides the designer with the ability of making tradeoffs between the accuracy and the speed as well as power/energy consumption.

Applying the approximation to the arithmetic units can be performed at different design abstraction levels including circuit, logic, and architecture levels, as well as algorithm and software layers. The approximation may be performed using different techniques such as allowing some timing violations (e.g., voltage over scaling or over clocking) and function approximation methods (e.g., modifying the Boolean function of a circuit) or a combination of them. In the category of function approximation methods, a number of approximating arithmetic building blocks, such as adders and multipliers, at different design levels have been suggested.

In this paper, we focus on proposing a high-speed low power/energy yet approximate multiplier appropriate for error resilient DSP applications. The proposed approximate multiplier, which is also area efficient, is constructed by modifying the conventional multiplication approach at the algorithm level assuming rounded input values. We call this rounding-based approximate (RoBA) multiplier. proposed multiplication approach is The applicable both signed and unsigned to multiplications for which three optimized architectures are presented. The efficiencies of these structures are assessed by comparing the delays, power and energy consumptions, energydelay products (EDPs), and areas with those of approximate some and accurate (exact) multipliers. The contributions of this paper can be summarized as follows:

1) Presenting a new scheme for RoBA multiplication by modifying the conventional multiplication approach;

2) Describing three hardware architectures of the proposed approximate multiplication scheme for sign and unsigned operations.

## 2. LITERATURE SURVEY

# Low-power digital signal processing using approximate adders:

Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage over scaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders.

# Energy-efficient approximate multiplication for digital signal processing and classification applications:

The need to support various digital signal processing (DSP) and classification applications on energy-constrained devices has steadily grown. Such applications often extensively perform matrix multiplications using fixed-point arithmetic while exhibiting tolerance for some computational errors. Hence, improving the energy efficiency of multiplications is critical. In this brief, we propose multiplier architectures that can tradeoff computational accuracy with energy consumption at design time. Compared with a precise multiplier, the proposed multiplier can consume 58% less energy/op with average computational error of ~1 %. Finally, we demonstrate that such a small computational error does not notably impact the quality of DSP and the accuracy of classification applications.

#### 3. EXISTING SYSTEM

Energy minimization is one of the main design requirements in almost any electronic systems, especially the portable ones such as smart phones, tablets, and different gadgets. It is highly desired to achieve this minimization with minimal performance (speed) penalty. Digital signal processing (DSP) blocks are key components of these portable devices for realizing various multimedia applications. Many of the DSP cores implement image and video processing algorithms where final outputs are either images or videos prepared for human consumptions. This fact enables us to use approximations for improving the speed/energy efficiency.

#### 4. PROPOSED SYSTEM

We focus on proposing a high-speed low power energy yet approximate multiplier appropriate for error resilient DSP applications. The proposed approximate multiplier, which is also area efficient, is constructed by modifying the conventional multiplication approach at the algorithm Level assuming rounded input values. We call this rounding-based approximate (RoBA) multiplier. The proposed multiplication approach is applicable to both signed and unsigned multiplications for which three optimized architectures are presented. The efficiencies of these structures are assessed by comparing the delays, power and energy consumptions, energydelay products (EDPs), and areas with those of some approximate and accurate (exact) multipliers.

#### Multiplication Algorithm of RoBA Multiplier:

The main idea behind the proposed approximate multiplier is to make use of the ease of operation when the numbers are two to the power n (2n). To elaborate on the operation of the approximate multiplier, first, let us denote the rounded numbers of the input of A and B by Ar and Br, respectively. The multiplication of A by B may be rewritten as

$$A \times B = (Ar - A) \times (Br - B) + Ar \times B + Br \times A - Ar \times Br.....(1)$$

The key observation is that the multiplications of  $Ar \times Br$ ,  $Ar \times B$ , and  $Br \times A$  may be implemented just by the shift operation. The hardware implementation of  $(Ar - A) \times (Br - B)$ , however, is rather complex. The weight of this term in the final result, which depends on differences of the exact numbers from their rounded ones, is typically small. Hence, we propose to omit this part from (1), helping simplify the multiplication operation. Hence, to perform the multiplication process, the following expression is used:

$$A \times B \sim = Ar \times B + Br \times A - Ar \times Br....(2)$$

Thus, one can perform the multiplication operation using three shift and two addition/subtraction operations. In this approach, the nearest values for A and B in the form of 2n should be determined.



Fig. 4.1: Block diagram for the hardware implementation of the proposed multiplier.

1

Applications: 1. multimedia 2. Digital signal processing

Advantages: Area and speed reduced.

## **EXTENSION:**

# FIR IMPLEMENTATION USING ROBA MULTIPLIER: FIR FILTER:

A filter is a device or process that removes some unwanted component or feature from a signal. Filtering is a class of signal processing, the defining feature of filter being the complete or partial suppression of some aspect of the signal. There are two main kinds of filter, analog and digital. Filters can be classified in several different groups, depending on what criteria are used for classification. The two major types of digital filters are finite impulse response digital filters (FIR filters) and infinite impulse response digital filters (IIR).



Fig 4.2: Finite Impulse Response Filter Realization

FIR filters are one of the primary types of filters used in Digital Signal Processing. FIR filters are said to be finite because they do not have any feedback. Therefore, if we send an impulse through the system (a single spike) then the output will invariably become zero as soon as the impulse runs through the filter. A nonrecursive filter has no feedback. The Finite Impulse Response Filter Realization is as shown in figure 4.2.

FINITE-IMPULSE RESPONSE (FIR) digital filter is widely used in several digital signal processing applications, such as speech processing, loud speaker equalization, echo cancellation, adaptive noise cancellation, and various communication applications, including software-defined radio (SDR) and so on. Many of these applications require FIR filters of large meet the stringent frequency order to specifications. Very often these filters need to support high sampling rate for high-speed digital communication. The number of multiplications and additions required for each filter output, however, increases linearly with the filter order. Since there is no redundant computation available FIR filter algorithm, in the real-time implementation of a large order FIR filter in a resource constrained environment is a challenging task. Filter coefficients very often remain constant and known apiarian signal processing applications. Finite Impulse Response (FIR) filters are widely used in digital signal processing. An N-tap FIR filter is defined by the following inputoutput equation 1

$$out(n) = \sum_{i=0}^{N-1} x(n-i) h(i)$$

Where {h (i): i = 0... N-1} are the filter coefficients.

An FIR filter implements a convolution operation [1], which is often built on the assumption of infinite length signals. Finite length signals (e.g. images) on the other hand, have discontinuities at the boundaries. Thus emerges the problem of which values to use at these regions. A usually recommended solution is to extend each row by reflection at the signal edges. The number of extra samples introduced at the signal boundaries is equal to N-1. They can be partitioned unequally between the left and the right side signal. We will refer by  $\alpha$  and  $\mu$ to the number of samples introduced respectively at the left and the right side input signal ( $\alpha+\mu=N-1$ ).

In the convolution process of the FIR Filter RoBA multiplier is used.

# 5. RESULTS

Different blocks of proposed system are designed coded in VERILOG HDL, simulated in I simulator and Xilinx ISE is the software tool used for FPGA synthesis.

| Name                     | Value                                   |                | 1,999,994 26 | 1,999,995ps                             | 1,999,996.ps | 1,999,997 pr             | 1,999,998 ps | 1,999,999 pr | 2,000, |
|--------------------------|-----------------------------------------|----------------|--------------|-----------------------------------------|--------------|--------------------------|--------------|--------------|--------|
| d ck                     | 1                                       | and the second |              | a statistical design                    | CON STREET   |                          |              |              |        |
| 10 mt                    | 0                                       | -              |              |                                         |              |                          |              |              |        |
| <ul> <li>ABL0</li> </ul> | 00000000000000000                       |                |              | 000000                                  |              | 000031030                |              |              |        |
| 🖌 🌃 BIBLO                | 000000000000000000000000000000000000000 |                |              | 000000                                  |              | 000001011                |              |              |        |
| Final_Out(63.0)          | 0000888000000000                        |                | 00000        | 01010101010                             |              | <b>0</b> (0)(0)(0)(0)(0) | 01303500     |              |        |
| 📲 वह्नद्व                | 00011                                   |                |              |                                         | 00011        |                          |              |              |        |
| • 💘 (2[4:0]              | 00011                                   |                |              |                                         | 00011        |                          |              |              |        |
| AIBEN                    | 000000000000000000000000000000000000000 |                |              | 000000                                  |              | 000031030                |              |              |        |
| 📲 B3(31,0)               | 0000000000000000000                     |                |              | 000000                                  |              | 00001011                 |              |              |        |
| 💘 A(51.0)                | 00000000000000000                       |                |              | 000000                                  |              | 000031030                |              |              |        |
| 💘 हत्वाद                 | 000000000000000000000000000000000000000 |                |              | 000000                                  |              | 00001000                 |              |              |        |
| • 🕎 514(630)             | 0000000000000000                        |                | 00000        |                                         |              |                          | 01510000     |              |        |
| 📲 Brk(63.0)              | 000000000000000000                      |                | 00000        | 000000000000000000000000000000000000000 |              | 0000000000000            | 01900000     |              |        |
| N6530                    | 00000000000000000                       |                | 00000        | 000000000000000000000000000000000000000 |              |                          | 01811800     |              |        |
| ad (63.0)                | 000010000000000000000000000000000000000 |                | 00000        | 000000000000000000000000000000000000000 |              |                          | 20101000     |              |        |
| Difficia 🙀               | 0093555500033385                        |                | 00000        |                                         |              |                          | 01101000     |              |        |

## **RTL SCHEMATIC:**



# **TECHNOLOGY SCHEMATIC:**



## **DESIGN SUMMARY:**

| Device Utilization Summary (estimated values) |      |           |             |  |  |
|-----------------------------------------------|------|-----------|-------------|--|--|
| Logic Utilization                             | Used | Available | Utilization |  |  |
| Number of Slices                              | 789  | 4656      | 16%         |  |  |
| Number of Slice Flip Flops                    | 10   | 9312      | 0%          |  |  |
| Number of 4 input LUTs                        | 1411 | 9312      | 15%         |  |  |
| Number of bonded ICBs                         | 130  | 232       | 56%         |  |  |
| Number of GCLKs                               | 1    | 24        | 4%          |  |  |

## **TIMING REPORT:**

Timing Summary:

-----

Speed Grade: -5

Minimum period: No path found Minimum input arrival time before clock: 16.024ns Maximum output required time after clock: 34.867ns Maximum combinational path delay: 38.386ns

# EXTENSION RESULT: SIMULATION:

| Name                              | Value         | 13,999,994 pm | (3,999,995 ps       | 3,999,996 pu                            | 3,999,997 pc                            | 3,999,996 pi | (3,999,999 pe | 4,000,000 ps  4 |
|-----------------------------------|---------------|---------------|---------------------|-----------------------------------------|-----------------------------------------|--------------|---------------|-----------------|
| <ul> <li>Martin (1994)</li> </ul> | 1010101311111 |               | 30                  | 1:0:0000000000000000000000000000000000  | 1111010/11111                           |              |               |                 |
| <ul> <li>Mailtol</li> </ul>       | 1010101111111 |               | 30                  | 01011111110100                          | 000000000111                            |              |               |                 |
| <ul> <li>M HIDTOL</li> </ul>      | 1901010111010 |               | 30                  |                                         | 00000013000000                          |              |               |                 |
| N231.0                            | 1010003800000 |               | 30                  | 31 2000000000000000000                  | 411111110111                            |              |               |                 |
| ▶ 🌃 83(31.0)                      | 0101000300000 |               | ds.                 | 200000000000000000000000000000000000000 | 000000000000000                         |              |               |                 |
| <ul> <li>M 1963-01</li> </ul>     | 9011110110001 | 0             | E1113013000110800   | 0100001110000300                        | 01000000000000000                       | 000000000000 |               |                 |
| dk.                               | đ             |               |                     |                                         |                                         |              |               |                 |
| The eff                           | a             |               |                     |                                         |                                         |              |               |                 |
| ▶ 🐏 w1(63x)(                      | 0101101300000 |               | 0011010000000011    | 000000000000000000000000000000000000000 |                                         | 000000000000 |               |                 |
| ▶ 🐏 w2963x3                       | 0101111110001 |               | 0111111300110000    | 0000000 10000000 10                     |                                         | 00000000000  |               |                 |
| NIB310                            | 0101110011111 | 9             | 01100111111111      | 0 10000 30 100 10 300                   | 000000000000000000000000000000000000000 | 000000000000 |               |                 |
| 🕨 🖬 w4(63:0)                      | 1110011011111 | -             | 1000110111111101    | 11111110011111110                       | 000000000000000000000000000000000000000 | 00000000000  |               |                 |
| ▶ 🎬 ¢183x0                        | 910110190000  |               | at 115 100000000 11 | 00000003001110-200                      |                                         | 00000000000  |               |                 |
| ► 📲 ±1(63:0)                      | 001110011001  | 0             | 01110011000100011   | 0000010001110110                        | oresterencere                           | 00000000000  |               |                 |
| ► ₩ £2(63:0)                      | 2200002880003 |               | apococcoccioncoco   |                                         |                                         |              |               |                 |
| 12(63:0)                          | 0101011010001 |               | 0100100000010000    | 310001100001310                         |                                         | 000000000000 |               |                 |

## **RTL SCHEMATIC:**



## **DESIGN SUMMARY:**

| Device Utilization Summary (estimated values) |      |           |             |  |  |
|-----------------------------------------------|------|-----------|-------------|--|--|
| Logic Utilization                             | Used | Available | Utilization |  |  |
| Number of Sices                               | 2256 | 4656      | 43%         |  |  |
| Number of Slice Flip Flops                    | 424  | 9312      | 4%          |  |  |
| Number of 4 input LUTs                        | 4307 | 9312      | 45%         |  |  |
| Number of bonded 108s                         | 226  | 232       | 97%         |  |  |
| Number of GCLKs                               | 1    | 24        | 45          |  |  |

### **TIMING REPORT:**

Timing Summary:

Speed Grade: -5

Minimum period: 19.790ns (Maximum Frequency: 50.530MHz) Minimum input arrival time before clock: 21.539ns Maximum output required time after clock: 24.403ns Maximum combinational path delay: 24.194ns

### 6. CONCLUSION

In this paper, we proposed a high-speed yet energy efficient approximate multiplier called RoBA multiplier. The proposed multiplier, which had high accuracy, was based on rounding of the inputs in the form of 2n. In this way, the computational intensive part of the multiplication was omitted improving speed and energy consumption at the price of a small error. The proposed approach was applicable to both signed and unsigned multiplications. Three hardware implementations of the approximate multiplier including one for the unsigned and two for the signed operations were discussed. The efficiencies of the proposed multipliers were evaluated by comparing them with those of some accurate and approximate multipliers using different design parameters.

## REFERENCES

[1] M. Alioto, "Ultra-low power VLSI circuit design demystified and explained: A tutorial," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 1, pp. 3–29, Jan. 2012.

[2] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low-power digital signal processing

using approximate adders," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137, Jan. 2013.

[3] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, "Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 4, pp. 850–862, Apr. 2010.

[4] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, "MACACO: Modeling and analysis of circuits for approximate computing," in Proc. Int. Conf. Comput.-Aided Design, Nov. 2011, pp. 667–673.

[5] F. Farshchi, M. S. Abrishami, and S. M. Fakhraie, "New approximate multiplier for low power digital signal processing," in Proc. 17th Int. Symp. Comput. Archit. Digit. Syst. (CADS), Oct. 2013, pp. 25–30.

[6] P. Kulkarni, P. Gupta, and M. Ercegovac, "Trading accuracy for power with an underdesigned multiplier architecture," in Proc. 24th Int. Conf. VLSI Design, Jan. 2011, pp. 346– 351.

[7] D. R. Kelly, B. J. Phillips, and S. Al-Sarawi, "Approximate signed binary integer multipliers for arithmetic data value speculation," in Proc. Conf. Design Archit. Signal Image Process., 2009, pp. 97–104.

[8] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, "Lowpower high-speed multiplier for error-tolerant application," in Proc. IEEE Int. Conf. Electron Devices Solid-State Circuits (EDSSC), Dec. 2010, pp. 1–4.

[9] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, "Design and analysis of approximate compressors for multiplication," IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.

[10] K. Bhardwaj and P. S. Mane, "ACMA: Accuracy-configurable multiplier architecture for error-resilient system-on-chip," in Proc. 8th Int. Workshop Reconfigurable Commun.-Centric Syst.-Chip, 2013, pp. 1–6. [11] K. Bhardwaj, P. S. Mane, and J. Henkel, "Power- and area-efficient approximate wallace tree multiplier for error-resilient systems," in Proc. 15th Int. Symp. Quality Electron. Design (ISQED), 2014, pp. 263–269.

[12] J. N. Mitchell, "Computer multiplication and division using binary logarithms," IRE Trans. Electron. Comput., vol. EC-11, no. 4, pp. 512–517, Aug. 1962.

## **BIOGRAPHIES:** GUIDE DETAILS:

P. ASHOK has received her B.E in Electronics & Communication Engineering and M.Tech degree in Electronics and Communication Engineering (VLSI Design). he is dedicated to teaching field from the few years. At present he is working as Associate Professor in Sri Mittapalli College of Engineering, Tummalapalem, Guntur, AP, india.

## STUDENT DETAILS:

M. SRINIVASACHARY pursuing M.Tech in Electronics & Communication Engineering (VLSI and Embedded Systems) from Sri Mittapalli College of Engineering, Tummalapalem, Guntur, AP, india.