Efficient Multi-User Parallel Greedy Bit-Loading Algorithm with Fairness Control For DMT Systems

This chapter addresses the multi-user bit-loading algorithm for discrete multitone (DMT) modulation in digital subscriber line (DSL) systems. The widely deployed asymmetric digital subscriber line (ADSL) provides the high bit rate data transmission as well as plain old telephone service (POTS) on a single twisted-pair at the same time. DMT, the core of DSL systems, divides the frequency-selective channel into large number of narrow subchannels. If the number is large enough, each subchannel becomes flat in frequency response, although the responses may differ a lot among subchannels. One of the advantages of DMT is that the power spectral density (PSD) and bits allocated to each subchannel could be chosen according to the subchannel signal-to-noise ratio (SNR) in order to obtain the optimal performance (e.g. maximum data rate, or minimum power consumption). This process is called bit loading and is a critical issue in the design of DMT systems. In the early days of DMT development, bit loading was studied only in single-user case, where only one pair of modems (transmitter and receiver) was considered. Compared with traditional telephone service, DSL systems always work on high frequency range, which causes the crosstalk interference among the twisted pairs in the same cable noticeable. The SNR of a subchannel is related not only with the PSD of its own transmitter, but also with the PSD of all other transmitters in the same cable that act as disturbers. The bit-loading algorithms need to be extended to multi-user scenario to obtain the global optimum performance among all users. The optimal algorithm for discrete multi-user bit loading is a natural extension of single-user greedy algorithm. A matrix of cost is calculated, with elements that represent the power increment to transmit additional bits for each subchannel. Then, the subchannel with minimum cost is found, and additional bits are assigned to it. The process continues until all subchannels are filled. A drawback of the multi-user greedy bit loading is the computation complexity. For a single iteration of the algorithm, only one subchannel on one user who has the minimum cost is selected to get additional bits. The objective of this chapter is to propose an efficient greedy bit-loading algorithm for multi-user DMT systems. An improved parallel bit-loading algorithm for multi-user DMT will be discussed. The new algorithm is based on multi-user greedy bit loading. In a single O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg


Introduction
This chapter addresses the multi-user bit-loading algorithm for discrete multitone (DMT) modulation in digital subscriber line (DSL) systems.The widely deployed asymmetric digital subscriber line (ADSL) provides the high bit rate data transmission as well as plain old telephone service (POTS) on a single twisted-pair at the same time.DMT, the core of DSL systems, divides the frequency-selective channel into large number of narrow subchannels.If the number is large enough, each subchannel becomes flat in frequency response, although the responses may differ a lot among subchannels.One of the advantages of DMT is that the power spectral density (PSD) and bits allocated to each subchannel could be chosen according to the subchannel signal-to-noise ratio (SNR) in order to obtain the optimal performance (e.g.maximum data rate, or minimum power consumption).This process is called bit loading and is a critical issue in the design of DMT systems.In the early days of DMT development, bit loading was studied only in single-user case, where only one pair of modems (transmitter and receiver) was considered.Compared with traditional telephone service, DSL systems always work on high frequency range, which causes the crosstalk interference among the twisted pairs in the same cable noticeable.The SNR of a subchannel is related not only with the PSD of its own transmitter, but also with the PSD of all other transmitters in the same cable that act as disturbers.The bit-loading algorithms need to be extended to multi-user scenario to obtain the global optimum performance among all users.The optimal algorithm for discrete multi-user bit loading is a natural extension of single-user greedy algorithm.A matrix of cost is calculated, with elements that represent the power increment to transmit additional bits for each subchannel.Then, the subchannel with minimum cost is found, and additional bits are assigned to it.The process continues until all subchannels are filled.A drawback of the multi-user greedy bit loading is the computation complexity.For a single iteration of the algorithm, only one subchannel on one user who has the minimum cost is selected to get additional bits.The objective of this chapter is to propose an efficient greedy bit-loading algorithm for multi-user DMT systems.An improved parallel bit-loading algorithm for multi-user DMT will be discussed.The new algorithm is based on multi-user greedy bit loading.In a single In DMT systems, QAM is used as modulation method to map digital information into complex numbers.And we know that QAM is a two-dimensional modulation method, which means it has two basis functions as in-phase function and quadrature function.Therefore, the channel capacity in QAM DMT is ( ) where SNR n refers to the signal-to-noise ratio for subchannel n, and C n refers to the capacity of subchannel n.Channel capacity is the theoretic upper limit of achievable data rate for a channel with probability of error that tends to zero.In practical analysis, the probability of error can never be zero; instead, we expect an acceptable error probability P e at some practical data rate.The reduced data rate could be expressed in a revised channel capacity formula by introducing a SNR gap .
When = 1 (0 dB), b i becomes the channel capacity.The selection of depends on the error probability P e and coding scheme.Higher P e requires larger .Complex coding scheme that guarantees reliable transmission can reduce the .For the two-dimensional QAM system with bit error rate (BER) at 10 -7 , the gap is computed using the following formula: (Chow et al., 1995) 9.8 ( ) (5) where m is the performance margin and c is the code gain.If the system is uncoded ( c = 0dB) and performance margin is 0 dB, the gap is 9.8 dB.

Water-filling algorithm
Water-filling algorithm is demonstrated in (Cover & Thomas, 1991) and (Gallager, 1968) as the optimal solution for the problem that distributes energy into parallel independent Gaussian channels with a common power constraint.Expand the SNR n in Equation ( 4) to the ratio of received signal power nn PH and noise power n N , where n P is the power allocated to subchannel n and n H is the channel gain.The number of bits that transmits in a subchannel is expressed as in Equation ( 6).
The problem of bit loading is an optimization problem that allocates power to subchannels.The target of the optimization is to maximize the aggregate number of bits transmitted on all N subchannels under the constraint that the total power should not exceed the power budget P. () The λ in Equation ( 8) is called Lagrangian Multiplier.Take the derivative on L(P n ) over the variable P n and make it equal to 0, Solve this equation to get the power allocation P n as: This equation could be rearranged to the following form: The variable g n in Equation ( 11) expresses the signal-to-noise ratio when unit power is transmitted on subchannel n.It is a unified version of SNR, which is a measurement to indicate the quality of subchannels.We can see from Equation (11) that P n has the form of "waterfilling" distribution.That is, the summation of power transmitted in subchannel n and inverse unified signal-to-noise power ratio (multiplied by ) must equal to a constant ( C λ ).The The formula to obtain the value of C λ comes from the constraint of total power budget.
The water-filling algorithm described in (Starr et al., 1999) and (Tu & Cioffi, 1990) starts from calculating the water-level C λ , then calculate the power allocation by using, The subchannels are sorted in descendent order with respective to the value of g n .Therefore, if in any step the calculated P n is negative, which means the subchannel's terrain is too high (signal-to-noise ratio is too small) to hold the power, the algorithm stops.The problem of water-filling algorithm is that in DMT systems, the bit loading for subchannels should be discrete numbers instead of arbitrary real numbers, which is assumed in this algorithm.

On/Off algorithm -chow's algorithm
The algorithm described in (Chow & Cioffi, 1995) utilizes the fact that if the same or nearly same subchannels are used, the difference of bit-loading result is very small (less than 2%, (Leke & Cioffi, 1997) between the proposed "on/off" algorithm and the traditional waterfilling algorithm.In Chow's on/off algorithm, the power distribution is flat, that is, the power is same over all the selected subchannels.On the subchannels that are not selected, the power is simply set to zero, which means they are turned off.The algorithm starts by sorting the SNRs in descendent order, so that the first subchannel been processed has the highest SNR.At the beginning, the number of subchannels turned on is zero.During each step, one more subchannel is turned on, which causes the total number of turned on subchannels to be K.The power allocated to each turned on subchannel is set to be , ( 1) where P budget is power constraint.If the P n is greater than the power mask at subchannel n, the power mask is used as P n .With the power allocation from Equation (15), numbers of bits in subchannels 1 to n are then calculated using Equation (6).If the aggregated number of bits over all used subchannels becomes less than the value in previous step, the algorithm stops, and the bit allocation scheme obtained from previous step is used as the bit loading result.All the remaining subchannels are thus left in the off state.
The flat power allocation in Chow's algorithm satisfies the requirement of static spectrum management of ADSL power spectral density.The reason why flat power allocation causes very small difference from optimal water-filling algorithm is studied in detail in (Yu & Cioffi, 2001).The answer is because the logarithm operation in ( 6) is insensitive to the power actually allocated to subchannels, unless the SNR is small.If the subchannels with SNR less than a cut-off value are turned off, the power could be simply allocated to all other subchannels evenly without loss of much accuracy.An even simpler algorithm was proposed in (Yu & Cioffi, 2001), that save the complexity to find the cut-off point of SNR.When the cut-off SNR is found, power allocation is just to assign constant power to subchannels that has SNRs greater than cut-off value, and assign zero power to other subchannels.

Greedy algorithm -hughes-hartogs algorithm
In Hughes-Hartogs's patent (Hughes-Hartogs, 1987-1989), an algorithm based on greedy idea was proposed.That is, every incremental power to transmit one additional bit is allocated to the subchannel that can use it most economically.For example, assume considering only two subchannels A and B. Subchannel A bears N A bits now, and the incremental power required to transmit N A +1 bits is ΔP A .For subchannel B that bears N B bits, the incremental power is ΔP B .If ΔP A <ΔP B , subchannel A will be selected to transmit the additional bit and gets the incremental power allocation.The power requirement for all subchannels to transmit all possible number of bits could be calculated in advance, and be saved in a matrix P as show in Fig. 2. The element P m,n in the matrix represents the power needed to transmit m bits in subchannel n.The values in first row are zeros obviously.The incremental power required to transmit one additional bit is calculated by subtracting the first row from the second row.The result ΔP is same as the second row in the first iteration.The subchannel n that has minimum value in ΔP is selected to get the additional bit.In the next iteration, the elements of column n in matrix P are shifted upward for one position.The following subtractions are still performed on the row one and row two, until the power budget is fully consumed.Hartogs, 1987Hartogs, -1989)).For the modern DSL systems, the number of subchannels is usually much larger than the voice-band modems.The slow convergence rate of Hughes-Hartogs algorithm and some other constraints, such as the fixed SNR assumption, make this algorithm impractical in DSL systems.But it is still a good starting point for later improved algorithms.

Bit removal greedy algorithm
Compared with the bit filling greedy algorithm described in Section 3.3, a bit removal greedy algorithm was proposed in (Sonalker & Shively, 1998) The bit removal algorithm first allocates the maximum possible numbers of bits to all subchannels.The maximum number of bits in a subchannel is determined by either the power mask limit or the upper limit of allowable bit number -whichever is smaller.Most likely, this bit-loading scheme will exceed the power budget, and the total bits number will be greater than the desired target.Then, the bits are removed one bit per time from the subchannel that may save the power most significantly by removing it.The removing process stops until the power constraint is satisfied, or the data rate requirement is satisfied.Authors (Sonalker & Shively, 1998) made a computation load comparison between bitremoval and bit-filling greedy algorithms over several loops.It showed that in most cases, the bit-removal algorithm required much fewer computations.

Peter, chow, cioffi, and bingham's practical algorithm
Chow, Cioffi, and Bingham proposed a practical DMT loading algorithm based on the rounded bit number and performance margin adjustment in (Chow et al., 1995).The idea is to make a round operation on the resulting bit loading value, which is expressed as to N, will exceed the target total bit number.In next step, the algorithm increases the margin by using the formula: With the updated margin γ , Equation ( 18) is calculated and rounded again.The process continues until the total number of bits reaches the target.If the process doesn't converge after the maximum number of iterations, the algorithm forces the convergence by adjusting the bit allocation according to the difference between b(n) and ˆ() bn .
At last, the algorithm makes further adjustment on energy distribution so that the bit error rates (BER) on all used subchannels are equal.The analysis in (Chow et al., 1995) shows that only 10 iterations is enough for ADSL loops, which is much faster than the Hughes-Hartogs algorithm.

Efficient greedy bit loading with fairness control for multi-user DMT
The twisted-pairs inside a cable could be imagined as a multi-user communication channel because of crosstalk coupling.The received signal power of any user depends on the peering transmitted power, and at the same time is impaired by additive white Gaussian noise (AWGN) and the crosstalk from all other transmitters in the same cable.Fig. 3 shows the multi-user channel environment, where H ii represents the main channel transfer function; H ij (i ≠ j) represents the crosstalk transfer function from user i to user j; σ i represents the power of AWGN noise.Under DMT scenario, we need to study a step further besides the total power of each user.The power allocation over frequency spectrum of each user is of interest.
The power allocation problem relates with the bit-loading problem closely.Actually, they are the two aspects of a same problem, because the number of bits that can be transmitted on a subchannel is a function of the power allocated to that subchannel.Equation ( 6) gives the relationship between them.In Section 3, we gave a review of several major bit-loading algorithms.However, all those algorithms we discussed are applied to a single modem (or single user).In crosstalk environment, bit-loading algorithms need to be extended to consider mutual effects between multiple users so that the global optimal performance among all modems (users) in a cable could be obtained.
In Section 4, we explore the current available multi-user bit-loading algorithms, and then propose an improved efficient algorithm, which allocates bits to multiple users in parallel on subchannels that have same frequency.This new algorithm reduces the number of iterations dramatically.A new fairness coefficient is also introduced to improve the fairness of data rate among users.

Notation
The notations that are used in the later Sections of this chapter are defined here.As shown in Figure 4, there are M users in the same cable in which interference exists between users.
Efficient Multi-User Parallel Greedy Bit-Loading Algorithm with Fairness Control For DMT Systems 111 We use variables i or j (i, j = 1, … M) to indicate the indices of users.In DMT based systems, each user has N sub-channels.The index of sub-channel is indicated by variable n (n = 1, …, N).We use the term "subchannel (i, n)" to refer to the subchannel n of user i, where i = 1…M; n = 1…N.More variables are listed as below: () i bn: Number of bits allocated to subchannel (i, n).Γ : SNR gap margin to ensure the performance under unexpected noise.It is a function of target BER, modulation scheme and coding scheme.
with the constraints of power and bit limits, such as: This is called "rate-adaptive loading" (Starr et al., 1999).In some cases, what we care about is not to get the maximum data rate, but to minimize the power consumption with a fixed data rate.The second problem is called "margin-adaptive loading".Formulated as in Equation ( 24): These two kinds of problems are equivalent in that algorithms designed for one problem could be applied similarly to another one.In this thesis, we concentrate on the rate-adaptive problem, that is, to maximize the total data rate.As a consequence of maximizing total data rate over all users, the user with better channel condition, which means it has smaller cost to transmit additional bits, will have more chance to get bits assigned to it until it meets the target data rate or exceeds the power budget.The user with worse channel condition is sacrificed in order to gain the maximum total data rate.However, in most real networks, users in the same cable are of equal priority.They pay the service provider the same fee to get broadband Internet access.Their service quality is supposed to be as equal as possible.Fairness should be considered in the design of bitloading algorithm.Thus, the multi-user bit loading becomes a multi-objective problem.On one hand, we want maximum total data rate (or equivalently, the minimum power consumption with given target data rate).On the other hand, we want to minimize the difference of data rate among users.Therefore, we defined the second objective as to minimize the variance of data rate among users.
( ) It is clear that the two objectives contradict each other.One direct effect of minimizing the total data rate difference is that the best-condition user cannot obtain its highest data rate as it can in single user algorithm.They cannot be achieved at the same time.So there must be a tradeoff between them.

Current multi-user bit-loading algorithms
Distributed iterative water-filling algorithm employs the single-user water-filling method to allocate power iteratively, user by user, until all users and subchannels are filled (Yu et al., 2002).The algorithm runs in two embedded loops.The outer loop looks for the optimal power constraint on each user by increasing or decreasing the power budget for the user, then the inner loop is called to calculate the power allocation under the given power constraint.If the result of inner loop gives data rate lower than target rate, total power will increase, and vice versa.The inner loop employs iterative water-filling method to get optimal power allocation for all the users.The problem of this algorithm is that if the target rate is not appropriate, this algorithm cannot converge.The question then switches to how to obtain the set of achievable target rates for each user.In the coordination of level 1 of DSM, a central agent with knowledge of all channel and interference transfer function exists and is able to calculate the achievable target rates.So, this algorithm is not totally autonomous, some kind of central control is required.
Iterative constant power (ICP) transmission, a slightly variation of iterative water-filling (IW) algorithm is proposed in (Yu & Cioffi, 2001).Both algorithms have the similar twostage structure.The difference lies in the inner loop: only constant value or zero value of power is allowed in ICP, while continuous power value is used in IW.Both of these two algorithms are suboptimal, but easy to deploy because there is no coordination among users.
The optimal algorithm for discrete multi-user bit loading is a natural extension of singleuser greedy algorithm.In the extended greedy algorithm, a matrix of cost is calculated.The elements in the matrix represent the power increment to transmit additional bits for each subchannel and each user.Then, the subchannel in a specific user with minimum cost is found, and additional bits are assigned to it.The process continues until all the power has been allocated.This algorithm is illustrated in (Lee et al., 2002).
A drawback of the multi-user greedy bit loading is the computation complexity.For a single iteration of the algorithm, only one subchannel on one user who has the minimum cost is selected to get additional bits.In each iteration step, the most time consuming calculation is to solve the linear equations to get power allocated to subchannels with specified bits allocation in order to updated the cost matrix.The number of subchannels in a DSL system is usually large, for example, in ADSL there are 223 subchannels for downstream (ANSI Std. T1.417, 2001).If the average number of bits assigned to a subchannel is 10, and there are 50 users in a cable, the total number of iterations that is required to allocate all bits is above 10 5 .

Formulation of the problem
Before introduce the efficient greedy bit loading with fairness, we first formulate the bitloading problem for multi-user DMT systems with the objectives of maximizing aggregate data rate (Equation ( 20)) and minimizing the data rate variance among users (Equation ( 25)).
By extending the single user bit loading to multi-user case, the noise that appears at the receiver is the summation of AWGN and crosstalk from all other users in the same cable (Fig. 3), Substitute Equation ( 26) into Equation ( 6) to replace the variable N n , we get the multi-user bit loading expression as The aggregate data rate of user i is the summation of bits transmitted over all subchannels divided by symbol period, in Equation ( 28) is constant, so maximizing Equation ( 20) is equivalent to maximizing the aggregate bit number over all sub-channels and over all users.
The constraint is Equation ( 22).Use the Lagrange multipliers method to solve this problem.

Construct the Lagrangian function as
[ ] Make derivatives of L to P i (n), and let it equal to zero to find the optimal point of L.

www.intechopen.com
Efficient Multi-User Parallel Greedy Bit-Loading Algorithm with Fairness Control For DMT Systems 115 The first term is the contribution from user i itself.The second term is a function of P j (n) of all other (M-1) users, and shows the effect of crosstalk.Since this term has a high order in denominator and because the crosstalk is weak compare to the main signal, we can ignore the second term to make the equation tractable and get the approximated expression of P i (n).
According to (Marler & Arora, 2003), the solution of a multi-objective optimization could be obtained by the "Lexicographic Method".The method solves the objective functions in the order of importance, and constructs new constraints for next objective function using the solution for previous objective.In this multi-user bit-loading problem, we first get the It is obvious that this optimization is even more complex than the first objective optimization, because both objective function and constraint function include variables in high order terms and denominators.A practical implementation of this algorithm is required.

Greedy algorithm for multi-user nit loading
A practical method to obtain the optimal solution of Equation ( 29) is the extended greedy algorithm.The foundation of greedy algorithm is to calculate the power cost to transmit additional bits for each subchannel (i, n).First, we rearrange Equation ( 29) to remove the logarithm, ( ) Then we get, www.intechopen.comEfficient Multi-User Parallel Greedy Bit-Loading Algorithm with Fairness Control For DMT Systems 117 The above equation can be expressed in matrix form as AXB = (45) where In Solving the above linear equation system, we obtain the power vector () n P required to transmit bit allocation scheme () n b on subchannel n for all M users.One thing worthy of notice is that the solution of () n P vector may contain negative elements.This indicates that the corresponding users cannot afford to transmit the number of bits assigned to them on subchannel n.
In DMT, different subchannels are well separated.The crosstalk coupling between different users only appears in the subchannels with same frequency.So we assume there is no interference between subchannels.Equation ( 45 is the summation of power increment on all users on subchannel n. ( ) This calculation needs to be done on every subchannel and every user.The final cost matrix is M rows by N columns, with each element at position (i, n) represents the cost of transmitting addition bits on subchannel n of user i.The position of the minimum cost determines the subchannel and user where additional bits will be transmitted.
Power budget need to be checked during the cost updating.Additional b Δ bits added to subchannel n of user i may cause the user i to get more power on subchannel n, and at the same time, it causes all other users to increase their power on subchannel n in order to maintain their SNR to transmit already-assigned number of bits.Either the user i or other users may have the possibility to exceed their power budget.If this happens, it means that the adding of b Δ bits to subchannel n of user i is not feasible.

Efficiency improvement to add bits to multiple users in parallel
As discussed in the Section 4, the large number of iterations for multi-user DMT makes the greedy bit-loading algorithm hard to deploy.We mitigate this problem by processing multiple users in parallel, so that the number of iterations could be reduced dramatically.
As we know, crosstalk only interferes with users on the subchannels that are in same frequency.Different-frequency subchannels are independent with each other.In other words, we can rewrite Equation ( 45) by indicating the subchannel index n explicitly as If, for example, two subchannels with different indices n 1 and n 2 get additional bits, no matter whether the subchannels are for the same user or for different users, two linear Equations of (50) have to be calculated.This means that adding bits to subchannels in the dimension of subchannel index requires the same number of iterations as the number of subchannels processed.However, if we add bits to subchannels in the dimension of user with subchannel index fixed, it is possible to reduce the number of calculations of Equation ( 50).Let us say, for instance, within a specific subchannel n, we assign additional bits to two users i 1 and i 2 , the resulting power scheme at subchannel n () n ′ P could be calculated by solving a single equation of Equation ( 50).The subchannel with minimum cost is identified by both subchannel index n and user index i.So, in the proposed algorithm, instead of adding bits to only one subchannel (i, n), we look for all users on subchannel n, which has cost very close to the minimum cost (cost min ).The additional bits are added to subchannel n of all these users.Fig. ( 5) visualizes the idea.The term "close" to the minimum cost is defined by a cost elastic coefficient cost .On subchannel n where cost min appears, if any user i satisfies the condition we add additional bits to it.The value of cost shows the percentage degree of how much the cost on a given subchannel is greater than the minimum cost.It could have any value greater than zero, depending on the accuracy we want in the algorithm.The effect of this coefficient will be analyzed in next chapter.The simulation result shows that importing this cost elastic coefficient has nearly no negative impact on the final bit-loading scheme, but the number of iterations reduced greatly.We assume there is no code gain and no noise margin, so the SNR gap Γ is chosen as 9.8 dB according to Equation (5).In the downstream of ADSL DMT, the subchannels start from 33 and end at 255 (ANSI Std. T1.417, 2001).Therefore, we are considering 255-33+1=223 subchannels in this simulation.The corresponding frequency range is 142 kHz -1100 kHz with subchannel spacing as 4312.5 Hz.The power budget for each user is 20.4 dBm (ANSI Std. T1.417, 2001).For simplicity, we chose the incremental unit for bit addition as 1 = Δ b bit.

Simulation mode
We have two improvements in the proposed algorithm: parallel bit allocation and fairness control.So we run the algorithm in all possible combinations.Table 2 shows the four operational modes of the algorithm.In traditional mode, neither of the two new features are applied, this is equivalent to the traditional multi-user DMT bit loading as in (Lee et al., 2002).We used this mode as reference to compare with our proposed features.

Simulation result analysis
In the proposed algorithm, we introduced two elastic coefficients ( cost and fair ) for parallel bit loading and fairness control respectively.However, how the coefficients affect the performance of algorithm is not straightforward.This section shows the analysis of the simulation results to demonstrate that by using cost elastic coefficient, the computational load decreased significantly; and by using fairness elastic coefficient, we have a way to control the data rate fairness among users in the same cable.
Traditional operational mode is used as comparison reference.In the following analysis, the value of -1 for cost or fair indicates that the algorithm does not utilize the proposed parallel bit loading or fairness control.The first data points in the following figures that have control value of -1 are used as reference points.

Effectiveness of parallel bit loading
The purpose of parallel bit loading is to reduce the computational load in the traditional multi-user DMT bit loading.In this section, we ran the algorithm with no fairness control so that it has comparability with traditional operational mode in terms of parallel bit loading.
From Fig. 8 we see that when there are 50 users in a cable, the number of iterations required to allocate all bits to users and subchannels is 66016, if no parallel bit-loading technique is used ( cost = -1).If cost is set to 0.1, the iterations number dropped immediately to only 22% of the previous value.The number even reduced to 12% if the cost is set to be greater than 0.6.This shows that parallel bit loading has a great improvement on the computational load for multi-user DMT bit loading.Fig. 9. shows the similar effect with the comparison for both 50 users and 15 users.We see that the more users in a cable, the more significant improvement could be achieved for number of iterations.Furthermore, we noticed that as 124 long as parallel bit loading is used, the iteration numbers is not very sensitive to the value of cost , especially when cost ≥ 0.4.Set cost to 1 is a good choice.Reducing the iterations number is not the only goal of the algorithm.To make the algorithm meaningful, we must guarantee that the final bit-loading result has no or only little loss comparing with the traditional algorithm.We run the algorithm to generate bit-loading schemes for 50 users.The number of bits on each user over all subchannels determines the data rate of the user.The average value (mean) of the bits numbers for 50 users specifies the performance of bit-loading algorithm.Table 3 and Fig. 10.show the mean bit numbers when different cost values were used.It is clear from the figure that the mean bits numbers are constant for all cost values that are less than or equal to 1.As cost elastic coefficient increases to greater than 1, the mean bit number decreases.Therefore, selecting cost to be 1 does not result in loss of any accuracy of the algorithm.11 demonstrates that many users get bits assigned to them simultaneously in a single iteration (marked by *).In this case, bits were assigned to 32 users in the iteration 3000.In other words, one iteration in the new efficient algorithm is equivalent to 32 iterations in traditional algorithm.That is a remarkable improvement.

Effectiveness of fairness control
By introducing the fairness control coefficient fair , we have the ability to constraint the variance of bit numbers over all users.As stated before, the loop lengths of 50 users have been sorted in ascendant order.That is, user 1 has the shortest loop length, and user 50 has the longest loop length.Therefore, if no fairness control is applied, the total number of bits assigned to each user is in the descendant order.The dashed curve in Fig. 13 is in this case.When we set the fair to be 1, the bits number of each user is limited to have minimum variance.Actually, in Fig. 12, it is a straight line (the solid curve).Two more curves are showed in Fig. 12, which represent the case for fair equals to 1.2 and 1.4 respectively.Fig. 13 shows the mean and standard deviation of bits number per user in the same figure when different fair were used.We see that these two curves have similar shapes.That is, when the standard deviation reduced, the mean also reduced.The benefit of fairness is compensated by the loss of average data rate.Fig. 14 shows how the fairness control coefficient affects the number of iterations.When fair equals to 1, the number of iterations is minimum, but at the same time, the number of bits assigned to each user is also minimum.Fig. 12 also shows that the bits numbers of short-loop users have very limited improvement when fairness control is applied.So the bits numbers for short-loop users have to be dropped greatly to obtain the small variance.This indicates that although the application of fairness control makes the standard deviation of bits number (data rate) small, we pay the cost to sacrifice the performance for short-loop users.Therefore, the fairness control may not be strongly desired.

Combination of parallel bit loading and fairness control
When the parallel bit loading and fairness control are applied at the same time, the combined effectiveness of them needs to be identified.This section analyzed the simulation results and illustrated them in 3-D plots.The algorithm was run in full combination of cost and fair .The major performance measures are recorded, such as number of iterations, mean and standard deviation of bits number per user.Tables 3, and 4    In the plot of Table 3 which is not shown here, it shows how the mean bits number per user is affected by cost elastic coefficient and fairness coefficient.The surface is flat along the cost elastic coefficient direction, which means the selection of cost has almost no influence on the mean bits number.The only contribution to affect of the mean bits number is the value of fair .Fig. 16 shows how cost elastic coefficient and fairness coefficient affect the standard deviation of bits number per user.The isolated effects of parallel bit loading and fairness control gave us the flexibility to apply them separately or in combination according to the requirement.

Summary of conclusions
Traditional bit-loading algorithms for DMT system were designed for a single user.As more and more users turn to DSL services, crosstalk inside the cable makes the traditional single-user bit-loading algorithms unable to reach the optimal solution.Multi-user bit-loading algorithms were designed to obtain the global optimization among all users in a cable.The problem of multi-user bit loading was computational complexity.Because the number of subchannels for all users is very large, the calculation of power allocation needs to done many times.
This research work studied several bit-loading algorithms for both single user and multiple user cases.Then an improved parallel bit-loading algorithm for multi-user DMT was proposed.The new algorithm was based on multi-user greedy bit loading.In a single iteration, the bits were allocated to multiple users on the subchannels with same frequency.Thus, the number of iterations to allocate bits to all users decreased significantly.The adjustable cost elastic coefficient defined the range of power cost.In the bit-loading iteration, all subchannels for different users that have the power cost within the range have the chance to get additional bits assigned to them.Furthermore, this research work studied the possibility to improve the fairness among users in the same cable.We introduced a fairness coefficient during bit-loading iterations so that the variance of total number of bits for all users could be controlled.The analysis of simulation results showed that the effectiveness of applying parallel bit loading is significant.If the cost elastic coefficient has the value of 1, it means that all subchannels who has power cost less than twice of the minimum cost could get additional bits assigned to them.For 50 users case and cost =1, the number of iterations to load all bits to all users reduced to only 12% of the number if no parallel bit loading is used.Another good characteristic of parallel bit loading is that it has very small effect on the final loading result.In the same 50-user case, when cost is less than 1, the loading result is exactly the same as no parallel bit loading.This means that we reduced the computational complexity without losing any accuracy of the result.The simulations also showed that the fairness in the algorithm control could limit the variance of total bit numbers among all users.The cost of fairness is that the better-condition loops (for example, shorter loop length) have to reduce their data rate, because the worse-condition loops have little improvement in their data rate.

Recommendations for future work
The parallel multi-user bit-load algorithm proposed in this research work reduced the number of iterations to allocated bits and power to all users.However, the computational load was still quite big even under the improved algorithm.Further improvements are required.For example, the idea of bit removal algorithm in (Sonalker & Shively, 1998) could be applied in the multi-user bit loading.In long-length loops, many high-frequency subchannels have no chance to get bits assigned to them.The better way is to turn off these subchannels before the algorithm starts.The computational load could be reduced even more when those "bad" subchannels are eliminated from the bit-loading process.It is implied in the multi-user bit loading that a central process unit must exist in the network.The central unit possesses the channel information of all users, such as channel transfer function and crosstalk transfer function.The bit-loading algorithm runs on the central unit and the loading scheme is then distributed to all transmitters in the same cable.Therefore, the channel and crosstalk estimation need to be added into the multi-user bitloading system (Cioffi et al., 2001) and (Zeng et al., 2001).
Fig. 1.Water-Filling margin, and has the initial value of 0. The zero value of margin γ causes the b(n) to get the maximum value.b(n) is then rounded to the nearest integer value ˆ() bn .In regular condition, the summation of rounded values ˆ() bn , n from 1

:
The vector of bit allocations at subchannel n for M users.()iPn : Power allocated to subchannel (i, n).

:
The vector of power allocation at subchannel n for M users.
Channel gain transfer function from user i to user j at the subchannel n.When i = j, () ii H n is the insertion loss transfer function for user i at subchannel n (ANSI Std.T1.417, 2001).When i ≠ j, () ij H n is the crosstalk transfer function from user i to user j at the subchannel n.

:
The variance of white Gaussian noise on subchannel (i, n).() mask Pn : The power mask on subchannel n.() budget Pi : The power budget for user i.
b Δ : The incremental unit of bits added to a subchannel in each iteration.In general, it should be an integer number.

:
The symbol period of the DMT system.For ADSL, it equals to 1/4000.

Fig. 4 .
Fig. 4. Multi-User Multi-Channel Configuration from the first maximization objective in Equation (29).Constraints (21) and (23) can be used as checking criteria during each step of the optimization.Substitute Equation (27) into Equation (29), we get the first objective function as for i = 1 to M and n = 1 to N, do a calculation of Equation (45) again to obtain the new power allocation () n ′ P .Because of the crosstalk coupling, one element change in the coefficient matrix A in Equation (45) causes power of all the users on the same subchannel to change.The cost of adding b Δ bits on subchannel (i, n)

Fig. 10 .
Fig. 10.Mean Bits Number Per User vs. Cost Elastic Coefficient

Fig. 11 .
Fig. 11.Multiple Users Get Bits in a Single Iteration

Fig. 12 .
Fig. 12. Bits Number of Users are some of the simulation results.

Fig. 15 .
Fig. 15.Number of Iterations vs. Cost Elastic Coefficient and Fairness Coefficient Fig. 15 indicates that the number of iterations has a strong correlation with cost elastic coefficient.As soon as the parallel bit loading is applied ( cost has any value other than -1), the number of iterations dropped significantly.Compared with the significance of parallel bit loading, the fairness coefficient has relatively little contribution to reduce the iteration number.

Table 2 .
Operational Mode of the Algorithm