Two-Rate Based Structures for Computationally Efficient Wide-Band FIR Systems

Digital filters, together with signal processing, are being employed in the new technologies and information systems, and are implemented in different areas and applications. Digital filters and signal processing are used with no costs and they can be adapted to different cases with great flexibility and reliability. This book presents advanced developments in digital filters and signal process methods covering different cases studies. They present the main essence of the subject, with the principal approaches to the most recent mathematical models that are being employed worldwide.


Introduction
Many digital signal processing (DSP) systems tend to have a very high computational complexity when they target a large part of the Nyquist band. This corresponds to a wide-band system with one or several so called don't-care bands approaching zero. Examples of such systems include frequency selective filters, fractional-delay filters, and differentiators. This chapter considers finite-length impulse response (FIR) filters due to their attractive implementation features. In particular, they can be implemented with non-recursive structures. In contrast to infinite-length impulse response (IIR) filters, they are therefore always automatically stable and have no bound on the maximal sampling rate, see [1,2].
For frequency-selective wide-band FIR filters, the frequency-response masking (FRM) technique can be employed for complexity reductions due to its use of sparse (namely periodic) subfilters, see [3][4][5][6][7][8][9][10]. For other functions, the FRM technique cannot be used directly, and one therefore has to seek other methods to reduce the complexity. This chapter discusses such a method which utilizes a two-rate technique, but only for the derivation of efficient single-rate structures. The basic two-rate approach was originally introduced in [11] and has since then been exploited and extended for various contexts as detailed in [12][13][14][15][16][17][18][19] and to be reviewed in this chapter. For single-function systems, it is however necessary to combine the two-rate technique with the FRM approach in order to achieve an overall complexity reduction. For multi-function realizations, complexity savings may be obtained without incorporating the FRM approach but it offers further complexity savings in such cases, as exemplified in [19]. Recent results have shown that the two-rate approach offers dramatic complexity reductions for wide-band systems, especially when combined with the FRM approach.

Chapter outline
Following this introduction, Section 2 considers the two-rate based structure that is appropriate for so called left-band and right-band systems which have don't-care bands at the low-frequency and high-frequency regions, respectively. Section 3 discusses the extension to so called mid-band systems which have don't-care bands at both the low-frequency and high-frequency regions. In Section 4, multi-function system realizations are considered, whereas Section 5 gives more implementation details. Finally, Section 6 concludes the chapter.

Two-rate based structure for left-band and right-band systems
This section will first revisit FIR filters and their computational complexity. After that, the two-rate based structure for left-band and right-band systems will be discussed.

Complexity of FIR filters
Consider a causal FIR filter with an impulse response h(n), transfer function The order of the system is N H and the impulse response duration (length) is N H + 1. A direct-form implementation of the filter, corresponding directly to the convolution where x(n) is the input and y(n) the output, requires N H + 1 multiplications and N H additions to compute each output sample y(n). In the case of a linear-phase frequency response, h(n) is symmetric or anti-symmetric which reduces the number of multiplications to roughly N H /2 1 .
The filter order required is determined by the application and specification. For example, for frequency selective filters, the order is inversely proportional to the transition band (don't-care band) ∆ = ω s − ω c , where ω c and ω s denote the passband and stopband edges, respectively, see [20,21]. Hence, when the don't-care band decreases towards zero, the order increases rapidly. Then, using a direct-form realization, the computational complexity may become intolerable as it follows the filter order. The same trend exists also for other functions that are not frequency selective filters, like differentiation and integration, as seen in [22].

Two-rate based structure
To reduce the complexity, we consider here a structure that is derived via a two-rate approach, seen in Fig. 1. This structure is efficient for left-band systems (like a differentiator) targeting the frequency region ω ∈ [0, ω c ], 0 < ω c < π. The same structure can also be used for right-band systems targeting the band ω ∈ [ω c , π], 0 < ω c < π. The only difference will appear in the design, and we will therefore focus on the left-band case in this chapter, and only comment upon the right-band case in the design section.
For a left-band specification, the basic idea is to first interpolate the input signal x(n) by two through upsampling by two followed by a lowpass filter with transfer function F(z) 2 . Then, a subsequent filter with transfer function G(z) follows that performs the actual function. Finally, downsampling by two takes place to retain the original sampling rate. Using multi-rate theory, see [23], it is readily shown that this scheme corresponds to a linear and time-invariant (LTI) system with a transfer function H(z) that equals the 0th polyphase component of the cascaded filter F(z)G(z), i.e., where and The final realization is thus a single-rate structure. A two-rate technique is only used to derive efficient structures. It is noted here that the order and delay of the overall filter H(z) is N H = (N F + N G )/2 and D H = (D F + D G )/2, respectively. This can be understood by noting that F(z) and G(z) can be viewed as operating (in principle) at two times the input rate, because the structure is derived by sandwiching F(z)G(z) between upsampling and downsampling by two. 2 The same function can be achieved by sampling the underlying analog signal with a higher sampling rate instead of sampling it slower and then use interpolation in the digital domain. However, this also increases the requirements on the analog-to-digital converters which are power-hungry components and in many cases one of the bottlenecks in overall systems. It is therefore often preferred to perform interpolation in the digital domain.
when realized with the FRM approach for an even-order (c) and odd-order (d) masking filter.

Filter Types
It is possible and efficient to let F(z) be a linear-phase half-band (HB) FIR filter 3 . Such a filter has a symmetric impulse response and every second impulse response value is zero, except the center tap which equals unity for an interpolation filter that preserves the signal energy. This corresponds to a pure-delay polyphase component

Overall filter H(z)
Half-band filter F(z) G(z) Type I, even order 2(m + p + 1) Type I, order 4m + 2 Type I, order 4p + 2 Type II, odd order 2(m + p) + 1 Type I, order 4m + 2 Type I, order 4p Type III, even order 2(m + p + 1) Type I, order 4m + 2 Type III, order 4p + 2 Type IV, odd order 2(m + p) + 1 Type I, order 4m + 2 Type III, order 4p where D F is the delay of F(z), which is always an odd integer. When F(z) is a linear-phase filter, G(z) is of the same type as that of the overall filter H(z), i.e., a linear-phase filter (nonlinear-phase filter) when H(z) is a linear-phase filter (nonlinear-phase filter). In this section, we focus on linear-phase filters. In the next section, nonlinear-phase applications are considered.
When F(z) is a linear-phase HB filter, it is a symmetric Type I filter with the odd-integer delay D F . Its delay contribution, D F /2, to the overall delay D H is therefore an integer plus a half. It is consequently the delay contribution D G /2 of G(z) that determines whether the overall delay is an integer or an integer plus a half. As D G /2 then must be an integer or an integer plus a half to obtain an overall linear-phase filter, D G must be an integer. Consequently, G(z) is either a Type I or Type III linear-phase FIR filter, i.e., an even-order filter with a symmetric or anti-symmetric impulse response. In other words, the type and order of G(z) determines the type and order of the overall filter, as summarized in Table 1. A formal proof of these facts is given in [17].
The order of G(z) is thus somewhat restricted as it cannot take on all even orders. However, the effective order of G(z) can be reduced by two by setting its first and last impulse response value to zero. In this way, two multiplications and additions may be saved in some cases. For the HB filter F(z), it does not make sense to try to reduce the effective order in this way, as its impulse response is always zero for odd indexes of n (except for the center tap).

Complexity reduction
Assume that H(e jω ) is to approximate a desired function D(jω) in the band ω ∈ [0, ω c ]. Due to the principle of interpolation by two in the two-rate based scheme, the effective bandwidth of G(z) is ω c /2, and thus always less than π/2. The complexity of G(z) alone will therefore be substantially lower than that of a regular direct-form realization of H(z). (This will be discussed in more detail in the design example considered later in Section 2.4). However, the overall complexity is also determined by the filter F(z). The requirement on this filter is roughly the same as that of the overall filter H(z) and its complexity is therefore relatively high. In other words, a major part of the overall complexity is moved to the filter F(z) and thus to F 0 (z) in Fig. 1. Therefore, for a single-function system, there will not be any computational savings using this approach straightforwardly. This is because we can equally well combine the three subfilters into one single conventional filter.
Nevertheless, overall savings can indeed be obtained by utilizing additional complexity saving techniques for the lowpass frequency selective HB filter F(z). Specifically, by realizing F(z) as an FRM filter, see [3][4][5][6][7][8][9][10], we can express the transfer function as where A(z L ) is a period model filter and [z −LD A − A(z L )] is its complement, whereas B 0 (z) and B 1 (z) are masking filters. Specifically, in the case of a HB filter, as detailed in [5,7], A(z) is given as with D A being the delay of A(z), whereas the masking filter are related according to with D B being the delay of B 0 (z). One then finds that F 0 (z) becomes, for D B even: and, for D B odd: where B 00 (z) and B 01 (z) are the polyphase components of Fig. 1(c) and (d). More details can be found in [7].
As seen, F 0 (z) makes use of three subfilters, of which A 0 (z L ) is periodic for an integer L > 1. A periodic filter is a sparse filter, meaning it has many zero-valued filter coefficients. Specifically, only every Lth impulse response value of A 0 (z L ) is non-zero. Consequently, a linear-phase filter A(z L ) of order N A requires roughly only N A /(2L) multiplications and N A /L additions. In this way, substantial overall savings can be obtained as compared to the conventional direct-form structures.

Design
Filters are typically designed in the minimax (Chebyshev) sense or least-squares sense, or possibly combinations thereof, see [24][25][26]. The goal of this chapter is to demonstrate that the complexity (number of multiplications and additions) can be reduced when using the two-rate based structures instead of regular structures. This will be done by designing both filter classes to meet the same specification and then comparing the resulting complexities 4 . To this end, the selection of approximation type is irrelevant, as long as one uses the same for both filter classes. In this chapter, we use minimax design, but other designs can of course be used as well after some minor appropriate modifications.
For minimax design, the maximum of the modulus of an error function E(jω) is minimized. The error function is typically given as where D(jω) is a desired function to be approximated in the frequency band Ω by the filter frequency response H(e jω ), whereas W(ω) is a positive weighting function. A conventional FIR filter, with the frequency response in the form of (2), is then designed by solving the following approximation problem.
Approximation problem: Given N H , find the unknowns h(n) and δ to minimize δ subject to For a linear-phase filter, we also have the additional symmetry constraints For a conventional filter, the problem above is a convex optimization problem which has a unique global optimum. It can be found using linear programming, see [27], or the more efficient McClellan-Parks-Rabiner algorithm given in [28]. In practice, one usually has a specification on the desired approximation error δ, say δ e . The filter will meet this specification if δ after the optimization satisfies δ ≤ δ e .
For the two-rate based filters, the design becomes more intricate because it contains cascaded and parallel subfilters. This means that the unknowns are not h(n) but instead f (n) and g(n), in general, and a(n), b 0 (n), and g(n) when F(z) is realized as an FRM filter. Hence, conventional design methods can no longer be used. Moreover, due to the cascaded subfilters, we are now facing a nonlinear (nonconvex) optimization problem, which means that an overall globally optimum solution cannot be guaranteed. Nevertheless, if carefully designed, even a locally optimum solution for a two-rate based structure can be substantially less complex than the corresponding globally optimum direct-form structure. To ensure a good local optimum, the overall two-rate based filters are designed in three steps as explained below. Although F(z) should here be an FRM HB filter in order to achieve any savings, we will first explain the essential design steps in terms of a regular HB filter for the sake of simplicity. After that, the necessary modifications required for an FRM design will be pointed out.

Basic Three-Step Design Procedure
Given the desired function and bandwidth ω ∈ [0, ω c ], ω c < π, as well as a targeted approximation error δ e , perform the following three-step procedure for each combination of filter orders N G and N F around estimated required orders N G and N F : (1) Design the regular FIR filter G(z), which gives G 0 (z) and G 1 (z) after polyphase decomposition. It is done by minimizing the maximum of (2) Design a regular lowpass HB FIR filter F(z), which gives F 0 (z) and (3) Use F 0 (z), G 0 (z), and G 1 (z) obtained above as the initial solution in a further nonlinear optimization routine that solves the approximation problem stated in (13). If the resulting approximation error δ is smaller than δ e after the optimization, store the result.
The estimated orders required, N G and N F , can be found by separately designing G(z) and F(z) to approximate their respective desired functions (as given in Steps 1 and 2, respectively) with the same tolerance as the overall targeted error, i.e., δ e (or similar as in [17][18][19]). As the bandwidth of G(z) is always below π/2, its order is typically below 12 for approximation errors down to some −100 dB, provided a smooth function is targeted, like a differentiator or integrator. Hence, the value of N G is readily found by designing G(z, d) for all low-order filters, using conventional techniques, and then set N G to the lowest one for which the approximation error |E G (jω)| is below δ e . As to the lowpass HB filter F(z), the value N F can be found via well-known formulas for order estimation, see [20,21], and a few designs around the estimated value.
Regarding the designs, the problems in Steps 1 and 2 are convex, and thus have unique global optima, provided they are formulated in accordance with the approximation problem stated earlier in this section. These problems can be solved using any regular solver for such problems. As F(z) is a linear-phase filter, it can alternatively be designed using the efficient McClellan-Parks-Rabiner algorithm given in [28]. The problem in Step 3 is nonlinear because of the cascaded subfilters.
In the examples of this chapter, we use the general-purpose nonlinear-optimization routine fminimax in MATLAB together with the real-rotation theorem, see [29], to solve the problem. The real-rotation theorem states that minimizing | f | is equivalent to minimizing ℜ{ f e jΘ }, ∀Θ ∈ [0, 2π]. The optimization problem is then solved with ω and Θ discretized to dense enough grids. A few hundred and 10-20 points, respectively, are typically sufficient in practice.

Modifications When Using an FRM Filter F(z)
When F(z) is an FRM filter, we can use essentially the same design steps as outlined above. However, a difference is that F(z) is now realized in terms of the two subfilters A(z P ) and B 0 (z) or, equivalently, F 0 (z) is now realized in terms of the three subfilters A 0 (z L ), B 00 (z) and B 01 (z). This means that three parameters, N A , N B , and L, instead of only one parameter, N F , need to be estimated. Given the same approximation error and band edges as before, F(z) as well as N A , N B , and L, can be obtained as outlined in [7]. It is noted here that the design of F(z) in Step 2 now corresponds to a nonconvex problem due to cascaded subfilters in the FRM approach. In [7], this is solved via initial linear optimizations and further nonlinear optimization, similar to the approach given above for the two-rate based structure.

Examples
Consider a first-degree differentiator with the desired function [4] D(jω) = e −jω(N G +N F )/4 jω (17) in the frequency region ωT ∈ [0, ω c ], 0 < ω c < π. This function can be approximated by a Type III linear-phase FIR filter, i.e., by a filter of even order and with an anti-symmetric impulse response, see [4].
Example 1: ω c = 0.95π, and δ e = 0.01 (−40 dB). Using a conventional differentiator, the specification is met by a 60th-order filter which requires 30 multiplications and 59 additions in an implementation. Using instead the two-rate and FRM based approach, with L = 5, we can meet the specification with filter orders 22, 18, and 2, for A(z), B 0 (z), and G(z), respectively. The corresponding overall realization requires 17 multiplications and 31 additions. Thus, multiplication and addition savings of 43% and 47%, respectively, are achieved. The savings are however dependent of the bandwidth ω c as will be illustrated below in Example 2. As always when using linear-phase FRM filters, the price to pay is a somewhat increased delay, and a few more delay elements. In this example, the delay is increased from 30 to 32 samples whereas the number of delay elements is increased from 60 to 64. The increase is thus only 7%. The overall filter frequency response is plotted in Fig. 2.
Example 2: Figure 3 shows the number of multiplications required for the conventional direct-form filter and the two-rate based filter, both approximating first-degree Type III differentiators with approximation errors of δ = 0.01, 0.001, 0.0001 (−40, −60, −80 dB).
As the plots reveal, the complexity savings using the two-rate based filter is increased substantially when the bandwidth approaches π. The break-even point is somewhere around ω c = 0.8π from which the savings increase approximately linearly with increasing bandwidth. In the region between 0.8π and 0.98π, the savings go from around zero up to some 65%. Similar savings are obtained also for the number of additions as it is proportional to the number of multiplications. Again, a price to pay for the arithmetic complexity reductions is a moderate increase of the delay and number of delay elements, typically between some 5% and 20%.
From the results in [17,22], the number of multiplications required for a regular Type III differentiator can be estimated as For the two-rate based differentiators, we have instead from [17] Comparing the two expressions, we see that the main difference is the multiplicative constant 0.956 in front of ω c in the latter expression. This explains why the savings increase with increasing bandwidth, as illustrated in Fig. 3.

Generalization to M > 2
The two-rate based scheme can readily be extended to the one depicted in Fig. 4(a) where the interpolation factor is an arbitrary integer M. Here, the basic principle is thus to first interpolate with M via the interpolation filter F(z). Then the actual function is again approximated by G(z). Finally, downsampling by M occurs. Using multi-rate theory, one finds again that this structure has the LTI system equivalent seen in Fig. 4(b). That is, the overall transfer function is where F m (z) and G m (z) are polyphase components of F(z) and G(z) in the polyphase representations Using an Mth-band interpolation filter F(z), also the generalized scheme is appropriate for left-band and right-band systems. It has turned out though that the case with M = 2 typically is the most efficient choice which is why that case has been considered in detail in this section. This is because the additional cost of F(z) exceeds the additional savings of G(z) when going from M = 2 to M > 2. This in turn is due to the fact that the complexity of G(z) is already very low for M = 2. A more detailed discussion on this is found in [17]. x(n) y(n)

Two-rate based structure for mid-band systems
This section extends the results to mid-band systems which target the region ω ∈ [ω c1 , ω c2 ], 0 < ω c1 < ω c2 < π. Example applications include fractional-degree differentiators and integrators, see [30][31][32][33]. For later discussions, we define the don't-care bands ∆ 1 and ∆ 2 as In principle, we can again make use of the scheme in Fig. 4 with a lowpass filter F(z) but it is not efficient for mid-band systems. This is because the filter G(z) then needs to approximate the desired function in the band between ω c1 /M = ∆ 1 /M and ω c2 /M = (π − ∆ 2 )/M. Although this implies that the width of the upper don't-care band of G(z) is increased substantially to roughly (M − 1)π/M instead of the original ∆ 2 = π − ω c2 , its lower don't-care band, ∆ 1 /M = ω c1 /M, becomes M times narrower. This means that the complexity of G(z) may thereby even increase, not decrease. In the left-band case, this is not a problem as there is no don't-care band to the left.
The width of both the lower and the upper don't-care bands of G(z) can be increased by using a bandpass filter F(z) instead of a lowpass filter. This also means that we have to use M > 2. Again, it appears that the most efficient case is for the lowest possible M which is here M = 3. The reason for this is two-fold. First, odd values of M makes it possible to center the passband of G(z) around π/2, which maximizes the minimum of its lower and upper don't-care bands. Second, the complexity of F(z) alone reduces with reduced M, in accordance with the discussion in [17] for the left-band case. In addition, the use of M = 3 instead of M > 3, makes it possible to double the amount of sparsity of F(z), and thus its efficiency, by expressing it as a periodic filter. Here with K being an appropriately chosen odd integer. For M = 3, one should use K = −1.
After the downsampling by M, the above region is mapped to the targeted region ω ∈ [ω c1 , ω c2 ]. Further, F(z) is to approximate M in the same region as that of G(z) and zero in the corresponding image bands created in the upsampling. Hence, F(z) is here a bandpass filter with passband and stopband edges at and ω (F) respectively. Moreover, with M = 3 and K = −1, F(z) is a symmetric bandpass filter centered on π/2. Consequently, it can be expressed as where P(z) is a unity-gain-passband third-band highpass filter. The polyphase decomposition of P(z) is then P(z) = 1/3 + z −1 P 1 (z 3 ) + z −2 P 2 (z 3 ) which leads to F(z) = 1 + 3z −2 P 1 (z 6 ) + 3z −4 P 2 (z 6 ) and the polyphase components A filter F(z) of the form above requires roughly only one third of the complexity of a general filter of the same order.

Complexity savings
As opposed to the case of linear-phase overall filters considered in Section 2, we can here achieve complexity savings without using additional FRM techniques. The reason is two-fold. First, as seen above, F(z) = P(z 2 ) is already sparse. Second, as a mid-band system is often a nonlinear-phase system, the filter coefficient are not symmetric. By using the two-rate based structure, symmetry can partially be utilized as F(z) is a symmetric filter whereas only the low-order G(z) is unsymmetric. As to the sparsity, the degree of sparseness can be increased by realizing P(z) as an FRM third-band filter. Details are given in [18].

Examples
Example 3: Consider the approximation of a fractional-degree differentiator with the desired function D(jω) = e −jω(N G +N F )/4 (jω) 0.5 in the frequency band ω ∈ [0.02π, 0.98π] and for an approximation error of δ e = 0.01. Figure 5 shows the frequency response and approximation error of the two-rate based design. The filter has been designed using essentially the same three-step procedure described earlier, but after minor appropriate modifications, as detailed in [18]. Table 2 gives the results for the conventional direct-form realization and for the two-rate based realizations, both with a sparse regular bandpass filter and a sparse FRM bandpass filter. The quantity D H denotes the integer part of the group delay whereas DE denotes the number of delay elements. As seen from the table, substantial savings are achieved using the two-rate based structures, especially when the FRM technique is also utilized. As usual when using the FRM technique, one has to pay a price in a somewhat increased delay. It is also noted that the savings increase/decrease with increased/decreased bandwidth (decreased/increased width of the don't-care bands). This is in line with the basic two-rate based scheme and it was exemplified earlier in Example 2.

Multi-function systems
In this section, we will discuss the extension to the realization of multifunction systems. The two-rate based approach is even more efficient for such systems as the same F(z), and thus the same F 0 (z), is shared between all functions. We will illustrate this for Farrow-structure based (see [34]) variable fractional-delay (VFD) filters. As an example will reveal, the two-rate based structure offers dramatic complexity reductions in this application, even without using the additional FRM approach. However, incorporating the FRM approach, further complexity savings are obtained.
where D H is a fixed delay which usually is an integer or an integer plus a half. Further, d is the fractional delay. The ideal response should be approximated in the band ω ∈ [0, ω c ], 0 < ω c < π, and for all fractional delays d ∈ [−0.5, 0.5] meaning that a whole sampling period (interval) is covered. In general, the sampling period is T, but we have used T = 1 in this chapter for simplicity.
Using the Farrow structure, H(z, d) is expressed in the form where H k (z) are fixed subfilters which, essentially, realize the weighted differentiators e −jωD H × (−jω) k /k! This follows immediately by truncating the Taylor series expansion of e −jωd , see [41]. Further, when there are no restrictions on the fixed part of the delay, it is possible and efficient to use linear-phase subfilters H k (z), thus with symmetric or antisymmetric impulse responses. We then have D H = N H /2, and the following two different cases. When G k (z) are of even order N G , they are of Type I (Type III) for even (odd) values of k. This results in an integer D H . In the odd-order case, G k (z) are instead of Type II (Type IV) for even (odd) values of k. In this case, D H is an integer plus a half. In both H 0 (z) Figure 6. Farrow structure realizing the VFD filter transfer function in (29).
cases, the impulse responses are symmetric (anti-symmetric) for even (odd) values of k, thus g k (n) = (−1) k g k (N G − n). Figure 6 shows the regular Farrow structure realizing (29). As seen, the problem amounts to realizing the L + 1 differentiator functions with ideal responses (−jωT) k /k!. In other words, it essentially corresponds to the realization of a multi-function system, although, in this case, the partial outputs are finally combined via the FD multiplications to form only one output.
Using now the two-rate based approach introduced in Section 2.2, each H k (z) is realized as where F 0 (z) and z −(D F −1)/2 are again the polyphase components of a linear-phase HB interpolation filter F(z) with a passband gain of two and delay D F , whereas G k0 (z) and G k1 (z) are the polyphase components of the subfilters G k (z). This follows from sandwiching the filter F(z)G(z, d) between the upsampler and downsampler, where G(z, d) approximates an FD filter in the region [0, ω c /2]. That is, The overall realization is shown in Fig. 7. It is noted that F(z) again can be realized using the FRM approach in order to further reduce the complexity, as demonstrated in [19]. In this case, F 0 (z) is again realized as in Fig. 1(c) or (d).

Design examples
Example 4: We consider the design of a VFD filter with a bandwidth of ω c = 0.9π. The filter has been designed using essentially the same three-step procedure described earlier.
More details are given in [19]. Tables 3 and 4 summarize the results where the number of multiplications and additions covers all fixed subfilters assuming appropriate use of direct-form and transposed direct-form realizations. In addition, L general multipliers and adders are needed for implementing the FD multiply-and-add chain, but this is required in all VFD filter structures. Further, the NRMS and δ gd values given in the tables indicate the normalized root-mean square error and maximum group-delay error as defined by (33) and (36), respectively, in [42], whereas δ e denotes the maximum of the modulus of the complex error. Further, DE denotes the number of delay elements. It is seen from the table that that the two-rate based structure is considerably more efficient than the regular Farrow structure.  Figure 7. Two-rate based structure realizing the VFD filter transfer function in (29) with H k (z) as in (30).

Linear-Phase
L N H N E N F N A N B P Reg. Farrow, WLS [44] 7 73 n/a n/a n/a n/a n/a Hybrid, WLS [42] 7 117 n/a n/a n/a n/a n/a Simplified [43] 9 73 n/a n/a n/a n/a n/a Two-rate based 7 75 12 138 n/a n/a n/a Two-rate based with FRM 7 87 12 162 46 26 3 It is also more efficient than two alternative approaches whose results are also included in the table, namely for the hybrid structure in [42] and the structure in [43] which meets roughly the same specification. It is also seen that the extended structure that utilizes the FRM technique offers further complexity reductions. The price to pay in this case is however a slight increase of the delay and delay elements, but the figures are still considerably smaller than for the structure in [42]. Compared with the regular Farrow structure in [44] and the one in [43], one has to pay the moderate price of a delay and delay element increase of some 3% using the basic structure in Fig. 7(b) and 19% using the extended structure incorporating the FRM approach.

Multiple-constant multiplication techniques for the subfilter implementations
This section will discuss implementation details, design trade-offs, and comparisons when the multiplications in the filters are implemented using multiple constant multiplication (MCM) techniques, which realize a number of constant multiplications using only shifts, adders and subtracters. The situation is different here though, than for the most commonly considered transposed direct-form filter realization, as the proposed structures consist of cascaded subfilters. This section will therefore elaborate on these issues and provide design examples. The focus here is on VFD filters using the two-rate based structure without the additional FRM approach.
For dedicated hardware implementations, one can take advantage of MCM techniques to reduce the implementation cost. Multiplications by constant coefficients can be performed using adders, subtracters, and shifts. As adders and subtracters have approximately the same implementation complexity we will refer to both as adders. Efficient realization of constant multiplications is an active research area and much effort has been focused on the case where one input data is multiplied by several constant coefficients. This problem has mainly been motivated by single-rate FIR filters, where for a transposed direct form FIR filter the input is multiplied by several coefficients, see [45][46][47][48][49]. The resulting implementation of several multiplications is denoted multiplier block, as in [45].
Work has also been done for sampling rate change with an integer factor in [50] and rational factor in [51], where it was shown that FIR filters in parallel can be implemented either using one multiplier block or by using a constant matrix multiplication block, as in [52][53][54][55], with the first approach requiring more delay elements than the latter. As a Farrow filter also is composed of several FIR filters in parallel we have the same implementation alternatives here, not only the single multiplier block case as reported in [56]. This has been extensively discussed in [57]. In Fig. 9, the approach to implement the subfilters proposed in [56] is shown. This approach typically requires few additions for the multiplier block. However, a separate set of registers is required for each subfilter and the number of structural adders x(n) Constant matrix multiplication is high. Alternatively, in Fig. 10, an approach based on the observation in [50] and further discussed in [57] is shown. Here, only one set of registers is required and the structural adders of the subfilters are merged into the matrix-vector multiplication.
The Farrow filter part of the two-rate based structure in Fig. 7 can be implemented similarly to what is shown in Figs. 9 or 10. For transposed direct form subfilters, as in Fig. 9, the corresponding structure would have two inputs, and, hence, result in a constant matrix multiplication. Using direct form subfilters, as in Fig. 10, requires two sets of registers, one for each input. For the HB filter it is convenient to use a direct form subfilter as the delayed input values are easily obtained from the registers. We note that the input to the lower branch subfilters in Fig. 7 is just a delayed version of the input, which is available from the upper branch subfilter F 0 (z). Therefore, it is possible to use the registers of the HB filter as registers for direct form subfilters. The resulting structure is illustrated in Fig. 11. Naturally, it is also possible to use a transposed direct form HB filter and/or transposed direct form subfilters in the Farrow filter part. From a complexity point of view, a transposed direct form HB filter will have the same number of adders and registers. However, it will not be possible to share registers as shown in Fig. 11.
Constant sum-of-products (K+1)/2 N G +1 Figure 11. Realization of the filter structure in Fig. 7 using direct form subfilters resulting in a sum-of-products block and a constant matrix multiplication.

Example and comparisons
Example 5: We consider a specification where the bandwidth is 0.9π and the modulus of the complex error should be below 0.0042. To meet this specification, the Farrow structure in Fig.  6, with subfilters jointly optimized as outlined in detail in [41], requires 45 fixed multipliers, 88 fixed adders, and 5 variable multipliers. The two-rate based structure in Fig. 7, with subfilters jointly optimized as detailed in [14], requires 30 fixed multipliers, 53 fixed adders, and 5 variable multipliers. Thus, in terms of number of multiplications and additions, the two-rate based structure is superior.
To refine the comparison when MCM techniques are applied we must quantize the filter coefficients. For a relative comparison, one can use simple rounding. We found that the original Farrow structure requires 11 bits to fulfil the requirements whereas the structure in Fig. 7 requires 13 fractional bits. The slightly larger number of bits for the two-rate approach is explained by the fact that a cascaded filter must meet the requirements which leads to a somewhat more stringent requirement on the subfilters, at least when simple rounding is used.
For the regular Farrow structure in Fig. 6, together with the realization in Fig. 9 a total of 33 adders are required for the multiplier block using the RAG-n algorithm in [45]. This is an optimal result since there are 33 different (odd) coefficients as discussed in [58], and, hence, there is no need to apply the slightly more efficient algorithms in [47][48][49]. Furthermore, 80 structural adders and 118 registers are required for the FIR subfilters. Further, five general multipliers and four additional adders are required (for both the regular Farrow and two-rate based filters). Alternatively, using the proposed structure in Fig. 10 the constant matrix multiplication can be realized using 96 adders using the algorithm in [53]. In addition, 26 structural adders are required. One observation is that separating the symmetric and anti-symmetric subfilters may reduce the complexity as some algorithms work better for fewer columns. If this is utilized, the number of adders can be reduced to 107 by applying the algorithm in [52] to the resulting two matrices and adding and subtracting the results. The number of registers is now decreased to 30, whereas the number of general multipliers and additional adders is constant.
For the two-rate based structure in Fig. 7, the HB filter requires 43 structural adders and 29 registers. The sum of products are realized by computing the corresponding multiplier  block with RAG-n and transposing it. For the constant matrix multiplication, 38 adders are required. This number is not reduced by separating the symmetric and anti-symmetric subfilters. A total of six additional registers are required as well as the five general multipliers and four additional adders.
The results are summarized in Table 5. It is seen that the two-rate based structure still has the lowest complexity for most implementation technologies as five registers will typically be less complex to implement than 26 adders. Furthermore, whereas the use of transposed direct form subfilters for the Farrow filter, as proposed in [50,57], reduces the number of adders related to the multiplication, it is for the two-rate approach still more efficient to use direct form subfilters.

Conclusion
This chapter has reviewed recent two-rate based structures and their design for obtaining efficient wide-band FIR systems. Left-band, right-band, and mid-band systems, as well as single-function and multi-function systems, were covered. Several design examples were given, for differentiators and VFD filters (a special case of multi-function systems), revealing dramatic complexity savings for wide-band specifications. More details can be found in [12][13][14][15][16][17][18][19].

Håkan Johansson and Oscar Gustafsson
Division of Electronics Systems, Department of Electrical Engineering, Linköping University, Sweden