Abstract
A unified view of the area of sparse signal processing is presented in tutorial form by bringing together various fields in which the property of sparsity has been successfully exploited. For each of these fields, various algorithms and techniques, which have been developed to leverage sparsity, are described succinctly. The common potential benefits of significant reduction in sampling rate and processing manipulations through sparse signal processing are revealed. The key application domains of sparse signal processing are sampling, coding, spectral estimation, array processing, component analysis, and multipath channel estimation. In terms of the sampling process and reconstruction algorithms, linkages are made with random sampling, compressed sensing, and rate of innovation. The redundancy introduced by channel coding in finite and real Galois fields is then related to oversampling with similar reconstruction algorithms. The error locator polynomial (ELP) and iterative methods are shown to work quite effectively for both sampling and coding applications. The methods of Prony, Pisarenko, and MUltiple SIgnal Classification (MUSIC) are next shown to be targeted at analyzing signals with sparse frequency domain representations. Specifically, the relations of the approach of Prony to an annihilating filter in rate of innovation and ELP in coding are emphasized; the Pisarenko and MUSIC methods are further improvements of the Prony method under noisy environments. The iterative methods developed for sampling and coding applications are shown to be powerful tools in spectral estimation. Such narrowband spectral estimation is then related to multisource location and direction of arrival estimation in array processing. Sparsity in unobservable source signals is also shown to facilitate source separation in sparse component analysis; the algorithms developed in this area such as linear programming and matching pursuit are also widely used in compressed sensing. Finally, the multipath channel estimation problem is shown to have a sparse formulation; algorithms similar to sampling and coding are used to estimate typical multicarrier communication channels.
Introduction
There are many applications in signal processing and communication systems where the discrete signals are sparse in some domain such as time, frequency, or space; i.e., most of the samples are zero, or alternatively, their transforms in another domain (normally called “frequency coefficients”) are sparse (see Figures 1 and 2). There are trivial sparse transformations where the sparsity is preserved in both the “time” and “frequency” domains; the identity transform matrix and its permutations are extreme examples. Wavelet transformations that preserve the local characteristics of a sparse signal can be regarded as “almost” sparse in the “frequency” domain; in general, for sparse signals, the more similar the transformation matrix is to an identity matrix, the sparser the signal is in the transform domain. In any of these scenarios, sampling and processing can be optimized using sparse signal processing. In other words, the sampling rate and the processing manipulations can be significantly reduced; hence, a combination of data compression and processing time reduction can be achieved.^{a}
Figure 1. Sparse discrete time signal with its DFT.
Figure 2. Sparsity is manifested in the frequency domain.
Each field has developed its own tools, algorithms, and reconstruction methods for sparse signal processing. Very few authors have noticed the similarities of these fields. It is the intention of this tutorial to describe these methods in each field succinctly and show that these methods can be used in other areas and applications often with appreciable improvements. Among these fields are 1—Sampling: random sampling of bandlimited signals [1], compressed sensing (CS) [2], and sampling with finite rate of innovation [3]; 2—Coding: Galois [4,5] and realfield error correction codes [6]; 3—Spectral Estimation[710]; 4—Array Processing: Multisource location (MSL) and direction of arrival (DOA) estimation [11,12], sparse array processing [13], and sensor networks [14]; 5—Sparse Component Analysis (SCA): blind source separation [1517] and dictionary representation [1820]; 6—Channel Estimation in Orthogonal Frequency Division Multiplexing (OFDM) [2123]. The sparsity properties of these fields are summarized in Tables 1, 2, and 3.^{b} The details of most of the major applications will be discussed in the next sections but the common traits will be discussed in this introduction.
Table 1. Various topics and applications with sparsity properties: the sparsity, which may be in the time/space or “frequency” domains, consists of unknown samples/coefficients that need to be determined
Table 2. List of acronyms
The columns of Table 1 consist of 0—category, 1—topics, 2—sparsity domain, 3—type of sparsity, 4—information domain, 5—type of sampling in information domain, 6—minimum sampling rate, 7—conventional reconstruction methods, and 8—applications. The first rows (2–7) of column 1 are on sampling techniques. The 8–9th rows are related to channel coding, row 10 is on spectral estimation and rows 11–13 are related to array processing. Rows 14–15 correspond to SCA and finally, row 16 covers multicarrier channel estimation, which is a rather new topic. As shown in column 2 of the table, depending on the topics, sparsity is defined in the time, space, or “frequency” domains. In some applications, the sparsity is defined as the number of polynomial coefficients (which in a way could be regarded as “frequency”), the number of sources (which may depend on location or time sparsity for the signal sources), or the number of “words” (signal bases) in a dictionary. The type of sparsity is shown in column 3; for sampling schemes, it is usually lowpass, bandpass, or multiband [24], while for compressed sensing, and most other applications, it is random. Column 4 represents the information domain, where the order of sparsity, locations, and amplitudes can be determined by proper sampling (column 5) in this domain. Column 7 is on traditional reconstruction methods; however, for each area, any of the reconstruction methods can be used. The other columns are self explanatory and will be discussed in more details in the following sections.
Table 3. Common notations used throughout the article
The rows 2–4 of Table 1 are related to the sampling (uniform or random) of signals that are bandlimited in the Fourier domain. Bandlimitedness is a special case of sparsity where the nonzero coefficients in the frequency domain are consecutive. A better assumption in the frequency domain is to have random sparsity [2527] as shown in row 5 and column 3. A generalization of the sparsity in the frequency domain is sparsity in any transform domain such as Discrete Cosine and Wavelet Transforms (DCT and DWT); this concept is further generalized in CS (row 6) where sampling is taken by a linear combination of time domain samples [2,2830]. Sampling of signals with finite rate of innovation (row 7) is related to piecewise smooth (polynomial based) signals. The positions of discontinuous points are determined by annihilating filters that are equivalent to error locator polynomials in error correction codes and the Prony’s method [10] as discussed in Sections 4 and 5, respectively.
Random errors in a Galois field (row 8) and the additive impulsive noise in realfield error correction codes (row 9) are sparse disturbances that need to be detected and removed. For erasure channels, the impulsive noise can be regarded as the negative of the missing sample value [31]; thus the missing sampling problem, which can also be regarded as a special case of nonuniform sampling, is also a special case of the error correction problem. A subclass of impulsive noise for 2D signals is salt and pepper noise [32]. The information domain, where the sampling process occurs, is called the syndrome which is usually in a transform domain. Spectral estimation (row 10) is the dual of error correction codes, i.e., the sparsity is in the frequency domain. MSL (row 11) and multitarget detection in radars are similar to spectral estimation since targets act as spatial sparse monotones; each target is mapped to a specific spatial frequency regarding its line of sight direction relative to the receiver. The techniques developed for this branch of science is unique; with examples such as MUSIC [7], Prony [8], and Pisarenko [9]. We shall see that the techniques used in realfield error correction codes such as iterative methods (IMAT) can also be used in this area.
The array processing category (rows 11–13) consists of three separate topics. The first one covers MSL in radars, sonars, and DOA. The techniques developed for this field are similar to the spectral estimation methods with emphasis on the minimum description length (MDL) [33]. The second topic in the array processing category is related to the design of sparse arrays where some of the array elements are missing; the remaining nodes form a nonuniform sparse grid. In this case, one of the optimization problems is to find the sparsest array (number, locations, and weights of elements) for a given beampattern. This problem has some resemblance to the missing sampling problem but will not be discussed in this article. The third topic is on sensor networks (row 13). Distributed sampling and recovery of a physical field using an array of sparse sensors is a problem of increasing interest in environmental and seismic monitoring applications of sensor networks [34]. Sensor fields may be bandlimited or nonbandlimited. Since the power consumption is the most restricting issue in sensors, it is vital to use the lowest possible number of sensors (sparse sensor networks) with the minimum processing computation; this topic also will not be discussed in this article.
In SCA, the number of observations is much less than the number of sources (signals). However, if the sources are sparse in the time domain, then the active sources and their amplitudes can be determined; this is equivalent to error correction codes. Sparse dictionary representation (SDR) is another new area where signals are represented by the sparsest number of words (signal bases) in a dictionary of finite number of words; this sparsity may result in a tremendous amount of data compression. When the dictionary is over complete, there are many ways to represent the signal; however, we are interested in the sparsest representation. Normally, for extraction of statistically independent sources, independent component analysis (ICA) is used for a complete set of linear mixtures. In the case of a noncomplete (underdetermined) set of linear mixtures, ICA can work if the sources are also sparse; for this special case, ICA analysis is synonymous with SCA.
Finally, channel estimation is shown in row 16. In mobile communication systems, multipath reflections create a channel that can be modeled by a sparse FIR filter. For proper decoding of the incoming data, the channel characteristics should be estimated before they can be equalized. For this purpose, a training sequence is inserted within the main data, which enables the receiver to obtain the output of the channel by exploiting this training sequence. The channel estimation problem becomes a deconvolution problem under noisy environments. The sparsity criterion of the channel greatly improves the channel estimation; this is where the algorithms for extraction of a sparse signal could be employed [21,22,35].
When sparsity is random, further signal processing is needed. In this case, there are three items that need to be considered. 1—Evaluating the number of sparse coefficients (or samples), 2—finding the positions of sparse coefficients, and 3—determining the values of these coefficients. In some applications, only the first two items are needed; e.g., in spectral estimation. However, in almost all the other cases mentioned in Table 1, all the three items should be determined. Various types of linear programming (LP) and some iterative algorithms, such as the IMAT with adaptive thresholding (IMAT), determine the number, positions, and values of sparse samples at the same time. On the other hand, the minimum description length (MDL) method, used in DOA/MSL and spectral estimation, determines the number of sparse source locations or frequencies. In the subsequent sections, we shall describe, in more detail, each algorithm for various areas and applications based on Table 1.
Finally, it should be mentioned that the signal model for each topic or application may be deterministic or stochastic. For example, in the sampling category for rows 2–4 and 7, the signal model is typically deterministic although stochastic models could also be envisioned [36]. On the other hand, for random sampling and CS (rows 5–6), the signal model is stochastic although deterministic models may also be envisioned [37]. In channel coding and estimation (rows 8–9 and 16), the signal model is normally deterministic. For Spectral and DOA estimation (rows 10–11), stochastic models are assumed, whereas for array beamforming (row 12), deterministic models are used. In sensor networks (row 13), both deterministic and stochastic signal models are employed. Finally, in SCA (rows 14–15), statistical independence of sources may be necessary and thus stochastic models are applied.
Underdetermined system of linear equations
In most of the applications where sparsity constraint plays a significant role, we are dealing with underdetermined system of linear equations; i.e., a sparse vector s_{n×1} is observed through a linear mixing system denoted by A_{m×n}where m<n:
Since m<n, the vector s_{n×1} cannot be uniquely recovered by observing the measurement vector x_{m×1}; however, among the infinite number of solutions to (1), the sparsest solution may be unique. For instance, if no 2k columns of A_{m×n} are linearly dependent, the nullspace of A_{m×n} does not include any 2ksparse vector (at most 2k nonzero elements) and therefore, the measurement vectors (x_{m×n}) of different ksparse vectors are different. Thus, if s_{n×1}is sparse enough (ksparse), the sparsest solution of (1) is unique and coincides with s_{n×1}; i.e., perfect recovery. Unfortunately, there are two obstacles here: (1) the vector x_{m×1} often includes an additive noise term, and (2) finding the sparsest solution of a linear system is an NP problem in general.
Since in the rest of the article, we are frequently dealing with the problem of reconstructing the sparsest solution of (1), we first review some of the important reconstruction methods in this section.
Greedy methods
Mallat and Zhang [38] have developed a general iterative method for approximating sparse decomposition. When the dictionary is orthogonal and the signal x is composed of k≪n atoms, the algorithm recovers the sparse decomposition exactly after n steps. The introduced method which is a greedy algorithm [39], is usually referred to as Matching Pursuit. Since the algorithm is myopic, in some certain cases, wrong atoms are chosen in the first few iterations, and thus the remaining iterations are spent on correcting the first few mistakes. The concepts of this method are the basis of other advanced greedy methods such as OMP [40] and CoSaMP [41]. The algorithms of these greedy methods (MP, OMP, and CoSaMP) are shown in Table 4.
Table 4. Greedy algorithms
Basis pursuit
The mathematical representation of counting the number of sparse components is denoted by ℓ_{0}. However, ℓ_{0}is not a proper norm and is not computationally tractable. The closest convex norm to ℓ_{0} is ℓ_{1}. The ℓ_{1}optimization of an overcomplete dictionary is called Basis Pursuit. However the ℓ_{1}norm is nondifferentiable and we cannot use gradient methods for optimal solutions [42]. On the other hand, the ℓ_{1} solution is stable due to its convexity (the global optimum is the same as the local one) [20].
Formally, the Basis Pursuit can be formulated as:
We now explain how the Basis Pursuit is related to LP. The standard form of LP is
a constrained optimization problem defined in terms of variable
where C^{T}x is the objective function, Ax=b is a set of equality constraints and ∀i: x_{i}≥0is a set of bounds. Table 5 shows this relationship. Thus, the solution of (2) can be obtained by solving the equivalent LP. The Interior Point methods are the main approaches to solve LP.
Gradient projection sparse reconstruction (GPSR)
The GPSR technique [44] is considered as one of the fast variations of the ℓ_{1}minimization method and consists of solving the following minimization problem:
Note that J(s) is almost the Lagrange form of the constraint problem in (2) where the Lagrange
multiplier is defined as
Now by assuming that all the vectors and matrices are real, it is easy to check that the minimizer of the following cost function (F) corresponds to the minimizer of J(s):
where
In GPSR, the latter cost function is iteratively minimized by moving in the opposite direction of the gradient while respecting the condition z≥0. There stepwise explanation of the basic GPSR method is given in Table 6. In this table, (a)_{ + } denotes the value max{a,0} while (a)_{ + } indicates the elementwise action of the same function on the vector A. There is another adaptation of this method known as BarzilaiBorwein (BB) GPSR which is not discussed here.
Table 6. Basic GPSR algorithm
Iterative shrinkagethreshold algorithm (ISTA)
Instead of using the gradient method for solving (4), it is possible to approximate
the cost function. To explain this idea, let s^{(0)} be an estimate of the minimizer of (4) and let
Now if s^{(1)} is the minimizer of
where β is a scalar larger than all squared singular values of Ato ensure (7). By modifying the constant terms and rewriting the above cost function,
one can check that the minimizer of
where
Note that the minimization problem in (9) is separable with respect to the elements
of sand we just need to find the minimizer of the singlevariable cost function
The steps of the ISTA algorithm are explained in Table 7.
Table 7. ISTA algorithm
FOCal underdetermined system solver (FOCUSS)
FOCal underdetermined system solver is a nonparametric algorithm that consists of two parts [46]. It starts by finding a low resolution estimation of the sparse signal, and then pruning this solution to a sparser signal representation through several iterations. The solution at each iteration step is found by taking the pseudoinverse of a modified weighted matrix. The pseudoinverse of the modified weighted matrix is defined by (AW)^{ + }=(AW)^{H}(AW·(AW)^{H})^{−1}. This iterative algorithm is the solution of the following optimization problem:
Description of this algorithm is given in Table 8 and an extended version is discussed in [46].
Table 8. FOCUSS (Basic)
Iterative detection and estimation (IDE)
The idea behind this method is based on a geometrical interpretation of the sparsity. Consider the elements of vector sare i.i.d. random variables. By plotting a sample distribution of vector s, which is obtained by plotting a large number of samples in the Sspace, it is observed that the points tend to concentrate first around the origin, then along the coordinate axes, and finally across the coordinate planes. The algorithm used in IDE is given in Table 9. In this table, s_{i}s are the inactive sources, s_{a}s are the active sources, A_{i} is the column of A corresponding to the inactive s_{i} and A_{a} is the column of A corresponding to the active s_{a}. Notice that IDE has some resemblances to the RDE method discussed in Section 4.1.2, IMAT mentioned in Section 4.1.2, and MIMAT explained in Section 8.1.2.
Table 9. IDE steps
Smoothed ℓ_{0}norm (SL0) method
As discussed earlier, the criterion for sparsity is the ℓ_{0}norm; thus our minimization is
The ℓ_{0}norm has two major drawbacks: the need for a combinatorial search, and its sensitivity to noise. These problems arise from the fact that the ℓ_{0}norm is discontinuous. The idea of SL0 is to approximate the ℓ_{0}norm with functions of the type [47]:
where σ is a parameter which determines the quality of the approximation. Note that we have
For the vector s, we have
Table 10. SL0 steps
Comparison of different techniques
The above techniques have been simulated and the results are depicted in Figure 3. In order to compare the efficiency and computational complexity of these methods,
we use a fixed synthetic mixing matrix and source vectors. The elements of the mixing
matrix are obtained from zero mean independent Gaussian random variables with variance
σ^{2}=1. Sparse sources have been artificially generated using a Bernoulli–Gaussian model:
s_{i}=pN(0,σ_{on}) + (1−p)N(0,σ_{off}). We set σ_{off}=0.01,σ_{on}=1and p=0.1. Then, we compute the noisy mixture vector xfrom x=As + ν, where ν is the noise vector. The elements of the vector νare generated according to independent zero mean Gaussian random variables with variance
Figure 3. Performance of various methods with respect to the standard deviation when n = 1,000, m = 400, and k = 100.
Figure 4. Computational time (complexity) versus the number of sources for m = 0.4n and k = 0.1 n.
Figures 5 and 6 illustrate a comparison of several sparse reconstruction methods for sparse DFT signals and sparse random transformations, respectively. In all the simulations, the block size of the sparse signal is 512 while the number of sparse signal components in the frequency domain is 20. The compression rate is 25% which leads to a selection of 128 time domain observation samples.
Figure 5. Performance comparison of some reconstruction techniques for DFT sparse signals.
Figure 6. Performance comparison of some reconstruction techniques for sparse random trasnformations.
In Figure 5, the greedy algorithms, COSAMP and OMP, demonstrate better performances than ISTA and GPSR, especially at lower input signal SNRs. IMAT shows a better performance than all other algorithms; however its performance in the higher input signal SNRs is almost similar to OMP and COSAMP. In Figure 6, OMP and COSAMP have better performances than the other ones while ISTA, SL0, and GPSR have more or less the same performances. In sparse DFT signals, the complexity of the IMAT algorithm is less than the others while ISTA is the most complex algorithm. Similarly in Figure 6, SL0 has the least complexity.
Sampling: uniform, nonuniform, missing, random, compressed sensing, rate of innovation
Analog signals can be represented by finite rate discrete samples (uniform, nonuniform, or random) if the signal has some sort of redundancies such as bandlimitedness, finite polynomial representation (e.g., periodic signals that are represented by a finite number of trigonometric polynomials), and nonlinear functions of such redundant functions [48,49]. The minimum sampling rate is the Nyquist rate for uniform sampling and its generalizations for nonuniform [1] and multiband signals [50]. When a signal is discrete, the equivalent discrete representation in the “frequency” domain (DFT, DCT, DWT, Discrete Hartley Transform (DHT), Discrete Sine Transform (DST)) may be sparse, which is the discrete version of bandlimited or multiband analog signals where the locations of the bands are unknown.
For discrete signals, if the nonzero coefficients (“frequency” sparsity) are consecutive, depending on the location of the zeros, they are called lowpass, bandpass, or multiband discrete signals; if the locations of the nonzero coefficients do not follow any of these patterns, the “frequency” sparsity is random. The number of discrete time samples needed to represent a frequencysparse signal with known sparsity pattern follows the law of algebra, i.e., the number of time samples should be equal to the number of coefficients in the “frequency” domain; since the two domains are related by a full rank transform matrix, recovery from the time samples is equivalent to solving an invertible k×k system of linear equations where k is the number of sparse coefficients. For bandlimited real signals, the Fourier transform (sparsity domain) consists of similar nonzero patterns in both negative and positive frequencies where only the positive part is counted as the bandwidth; thus, the law of algebra is equivalent to the Nyquist rate, i.e., twice the bandwidth (for discrete signals with DC components it is twice the bandwidth minus one). The dual of frequencysparsity is timesparsity, which can happen in a burst or a random fashion. The number of “frequency” coefficients needed follows the Nyquist criterion. This will be further discussed in Section 4 for sparse additive impulsive noise channels.
Sampling of sparse signals
If the sparsity locations of a signal are known in a transform domain, then the number of samples needed in the time (space) domain should be at least equal to the number of sparse coefficients, i.e., the socalled Nyquist rate. However, depending on the type of sparsity (lowpass, bandpass, or random) and the type of sampling (uniform, periodic nonuniform, or random), the reconstruction may be unstable and the corresponding reconstruction matrix may be illconditioned [51,52]. Thus in many applications discussed in Table 1, the sampling rate in column 6 is higher than the minimum (Nyquist) rate.
When the location of sparsity is not known, by the law of algebra, the number of samples
needed to specify the sparsity is at least twice the number of sparse coefficients.
Again for stability reasons, the actual sampling rate is higher than this minimum
figure [1,50]. To guarantee stability, instead of direct sampling of the signal, a combination
of the samples can be used. Donoho has recently shown that if we take linear combinations
of the samples, the minimum stable sampling rate is of the order
Reconstruction algorithms
There are many reconstruction algorithms that can be used depending on the sparsity pattern, uniform or random sampling, complexity issues, and sensitivity to quantization and additive noise [53,54]. Among these methods are LP, lagrange interpolation [55], time varying method [56], spline interpolation [57], matrix inversion [58], error locator polynomial (ELP) [59], iterative techniques [52,6065], and IMAT [25,31,66,67]. In the following, we will only concentrate on the last three methods as well as the first (LP) that have been proven to be effective and practical.
Iterative methods when the location of sparsity is known
The reconstruction algorithms have to recover the original sparse signal from the information domain and the type of sparsity in the transform domain. We know the samples in the information domain (both position and amplitude) and we know the location of sparsity in the transform domain. An iteration between these two domains (Figure 7 and Table 11) or consecutive Projections Onto Convex Sets (POCS) should yield the original signal [51,61,62,65,6871].
Figure 7. Block diagram of the iterative reconstruction method. The mask is an appropriate filter with coefficients of 1’s and 0’s depending on the type of sparsity in the original signal.
In the case of the usual assumption that the sparsity is in the “frequency” domain and for the uniform sampling case of lowpass signals, one projection (bandlimiting in the frequency domain) suffices. However, if the frequency sparsity is random, the time samples are nonuniform, or the “frequency” domain is defined in a domain other than the DFT, then we need several iterations to have a good replica of the original signal. In general, this iterative method converges if the “Nyquist” rate is satisfied, i.e., the number of samples per block is greater than or equal to the number of coefficients. Figure 8 shows the improvement in dB versus the number of iterations for a random sampling set for a bandpass signal. In this figure, besides the standard iterative method, accelerated iterations such as Chebyshev and conjugate gradient methods are also used (please see [72] for the algorithms).
Figure 8. SNR improvement vs. the no. of iterations for a random sampling set at the Nyquist rate (OSR = 1) for a bandpass signal.
Iterative methods are quite robust against quantization and additive noise. In fact, we can prove that the iterative methods approach the pseudoinverse (least squares) solution for a noisy environment; specially, when the matrix is illconditioned [50].
Iterative method with adaptive threshold (IMAT) for unknown location of sparsity
As expected, when sparsity is assumed to be random, further signal processing is needed. We need to evaluate the number of sparse coefficients (or samples), the position of sparsity, and the values of the coefficients. The above iterative method cannot work since projection (the masking operation in Figure 7) onto the “frequency” domain is not possible without the knowledge of the positions of sparse coefficients. In this scenario, we need to use the knowledge of sparsity in some way. The introduction of an adaptive nonlinear threshold in the iterative method can do the trick and thus the name, IMAT; the block diagram and the pseudocode are depicted in Figure 9 and Table 12, respectively. The algorithms in [23,25,31,73] are variations of this method. Figure 9 shows that by alternate projections between information and sparsity domains (adaptively lowering or raising the threshold levels in the sparsity domain), the sparse coefficients are gradually picked up after several iterations. This method can be considered as a modified version of Matching Pursuit as described in Section 2.1; the results are shown in Figure 10. The sampling rate in the time domain is twice the number of unknown sparse coefficients. This is called the full capacity rate; this figure shows that after approximately 15 iterations, the SNR reaches its peak value. In general, the higher the sampling rate relative to the full capacity, the faster is the convergence rate and the better the SNR value.
Figure 9. The IMAT for detecting the number, location, and values of sparsity.
Figure 10. SNR vs. the no. of iterations for sparse signal recovery using the IMAT (Table 12).
Matrix solutions
When the sparse nonzero locations are known, matrix approaches can be utilized to determine the values of sparse coefficients [58]. Although these methods are rather straightforward, they may not be robust against quantization or additive noise when the matrices are ill conditioned.
There are other approaches such as Spline interpolation [57], nonlinear/time varying methods [58], Lagrange interpolation [55] and error locator polynomial (ELP) [74] that will not be discussed here. However, the ELP approach will be discussed in Section 4.1; variations of this method are called the annihilating filter in sampling with finite rate of innovation (Section 3.3) and Prony’s method in spectral and DOA estimation (Section 5.1). These methods work quite well in the absence of additive noise but they may not be robust in the presence of noise. In the case of additive noise, the extensions of the Prony method (ELP) such as Pisarenko harmonic decomposition (PHD), MUSIC and Estimation of signal parameters via rotational invariance techniques (ESPRIT) will be discussed in Sections 5.2, 5.3, and 6.
Compressed sensing (CS)
The relatively new topic of CS (Compressive) for sparse signals was originally introduced in [29,75] and further extended in [30,76,77]. The idea is to introduce sampling schemes with low number of required samples which uniquely represent the original sparse signal; these methods have lower computational complexities than the traditional techniques that employ oversampling and then apply compression. In other words, compression is achieved exactly at the time of sampling. Unlike the classical sampling theorem [78] based on the Fourier transform, the signals are assumed to be sparse in an arbitrary transform domain. Furthermore, there is no restricting assumption for the locations of nonzero coefficients in the sparsity domain; i.e., the locations should not follow a specific pattern such as lowpass or multiband structure. Clearly, this assumption includes a more general class of signals than the ones previously studied.
Since the concept of sparsity in a transform domain is more convenient to study for discrete signals, most of the research in this field is focused along discrete type signals [79]; however, recent results [80] show that most of the work can be generalized to continuous signals in shiftinvariant subspaces (a subclass of the signals which are represented by Riesz basis).^{c} We first study discrete signals and then briefly discuss the extension to the continuous case.
CS mathematical modeling
Let the vector
where s is an n×1vector which has at most k nonzero elements (ksparse vectors). In practical cases, shas at most k significant elements and the insignificant elements are set to zero which means s is an almost ksparse vector. For example, x can be the pixels of an image and Ψcan be the corresponding IDCT matrix. In this case, most of the DCT coefficients are insignificant and if they are set to zero, the quality of the image will not degrade significantly. In fact, this is the main concept behind some of the lossy compression methods such as JPEG. Since the inverse transform on x yields s, the vector s can be used instead of x, which can be succinctly represented by the locations and values of the nonzero elements of s. Although this method efficiently compresses x, it initially requires all the samples of xto produce s, which undermines the whole purpose of CS.
Now let us assume that instead of samples of x, we take m linear combinations of the samples (called generalized samples). If we represent these linear combinations by the matrix Φ_{m×n} and the resultant vector of samples by y_{m×1}, we have
The question is how the matrix Φand the size m should be chosen to ensure that these samples uniquely represent the original signal x. Obviously, the case of Φ=I_{n×n}where I_{n×n} is an n×nidentity matrix yields a trivial solution (keeping all the samples of x) that does not employ the sparsity condition. We look for Φmatrices with as few rows as possible which can guarantee the invertibility, stability, and robustness of the sampling process for the class of sparse inputs.
To solve this problem, we introduce probabilistic measures; i.e., instead of exact recovery of signals, we focus on the probability that a random sparse signal (according to a given probability density function) fails to be reconstructed using its generalized samples. If the probability δof failure can be made arbitrarily small, then the sampling scheme (the joint pair of Ψ,Φ) is successful in recovering x with probability 1−δ, i.e., with high probability.
Let us assume that Φ^{(m)}represents the submatrix formed by m random (uniform) rows of an orthonormal matrix Φ_{n×n}. It is apparent that if we use
where P_{failure} is the probability that the original signal cannot be recovered from the samples, c is a positive constant, and μ(Ψ,Φ) is the maximum coherence between the columns of Ψand rows of Φdefined by [82]:
where ψ_{a},ϕ_{b} are the a^{th} column and the b^{th} row of the matrices Ψand Φ, respectively. The above result implies that the probability of reconstruction is close to one for
The above derivation implies that the smaller the maximum coherence between the two matrices, and the lower is the number of required samples. Thus, to decrease the number of samples, we should look for matrices Φ with low coherence with Ψ. For this purpose, we use a random Φ. It is shown that the coherence of a random matrix with i.i.d. Gaussian distribution with any unitary Ψ is considerably small [29], which makes it a proper candidate for the sampling matrix. Investigation of the probability distribution has shown that the Gaussian PDF is not the only solution (for example binary Bernouli distribution and other types are considered in [83]) but may be the simplest to analyze.
For the case of random matrix with i.i.d. Gaussian distribution (or more general distributions for which the concentration inequality holds [83]), a stronger inequality compared with (20) is valid; this implies that for the reconstruction with a probability of almost one, the following condition for the number of samples m suffices [2,79]:
Notice that the required number of samples given in (20) is for random sampling of an orthonormal basis while (21) represents the required number of samples with i.i.d. Gaussian distributed sampling matrix. Typically, the number in (21) is less than that of (20).
Reconstruction from compressed measurements
In this section, we consider reconstruction algorithms and the stability robustness issues. We briefly discuss the following three methods: a—geometric, b—combinatorial, and c—information theoretic. The first two methods are standard while the last one is more recent.
Geometric methods
The oldest methods for reconstruction from compressed sampling are geometric, i.e.,
ℓ_{1} minimization techniques for finding a ksparse vector
Good methods for the minimization of an ℓ_{0}norm (sparsity) do not exist. The ones that are known are either computationally prohibitive or are not well behaved when the measurements are corrupted with noise. However, it is shown in [82] and later in [76,87] that minimization of an ℓ_{1}norm results in the same vector sfor many cases:
The interesting part is that the number of required samples to replace ℓ_{0} with ℓ_{1}minimization has the same order of magnitude as the one for the invertibility of the sampling scheme. Hence, s can be derived from (22) using ℓ_{1}minimization. It is worthwhile to mention that replacement of ℓ_{1}norm with ℓ_{2}norm, which is faster to implement, does not necessarily produce reasonable solutions. However, there are greedy methods (Matching Pursuit as discussed in Section 7 on SCA [40,88]) which iteratively approach the best solution and compete with the ℓ_{1}norm optimization (equivalent to Basis Pursuit methods as discussed in Section 7 on SCA).
To show the performance of the BP method, we have reported the famous phase transition diagram from [89] in Figure 11; this figure characterizes the perfect reconstruction region with respect to the parameters k/m and m/n. In fact, the curve represents the points for which the BP method recovers the sparse signal measured through a Gaussian random matrix with probability 50%. The interesting point is that the transition from the highprobability region (below the curve) to the lowprobability one (above the curve) is very sharp and when n→∞ the plotted curve separates the regions for probabilities 0 and 100%. The empirical results show that by deviating from the Gaussian distribution, the curve does not change while it is yet to be proved [89].
Figure 11. The phase transition of the BP method for reconstruction of the sparse vector from
Gaussian random measurement matrices; the probability of perfect reconstruction for
the pairs of
A sufficient condition for these methods to work is that the matrix Φ·Ψ must satisfy the socalled restricted isometric property (RIP) [75,83,90]; which will be discussed in the following section.
Restricted isometric property
It is important to note that the ℓ_{1}minimization algorithm produces almost optimal results for signals that are not ksparse. For example, almost sparse signals (compressible signals) are more likely to occur in applications than exactly ksparse vectors, (e.g., the wavelet transform of an image consists mostly of small coefficients and a few large coefficients). Moreover, even exactly ksparse signals may be corrupted by additive noise. This characteristic of ℓ_{1}minimization algorithms is called stability. Specifically, if we let β_{k}(s) denote the smallest possible error (in the ℓ_{1}norm) that can be achieved by approximating a signal s by a ksparse vector z
In compressed sensing, the degree of stability and robustness of the reconstruction is determined by the characteristics of the sampling matrix Φ. We say that the matrix Φ has RIP of order k, when for all ksparse vectors s, we have [30,76]:
where 0≤δ_{k}<1(isometry constant). The RIP is a sufficient condition that provides us with the maximum and minimum power of the samples with respect to the input power and ensures that none of the ksparse inputs fall in the null space of the sampling matrix. The RIP property essentially states that every k columns of the matrix Φ_{m×n}must be almost orthonormal (these submatrices preserve the norm within the constants 1±δ_{k}). The explicit construction of a matrix with such a property is difficult for any given n,k and m≈klogn; however, the problem has been studied in some cases [37,92]. Moreover, given such a matrix Φ, the evaluation of s (or alternatively x) via the minimization problem involves numerical methods (e.g., linear programming, GPSR, SPGL1, FPC [44,93]) for n variables and m constraints which can be computationally expensive.
However, probabilistic methods can be used to construct m×n matrices satisfying the RIP property for a given n,k and m≈klogn. This can be achieved using Gaussian random matrices. If Φis a sample of a Gaussian random matrix with the number of rows satisfying (20), Φ·Ψis also a sample of a Gaussian random matrix with the same number of rows and thus it satisfies RIP with high probability. Using matrices with the appropriate RIP property in the ℓ_{1}minimization, we guarantee exact recovery of ksparse signals that are stable and robust against additive noise.
Without loss of generality, assume that Ψ is equal to the identity matrix I, and that instead of Φ·s, we measure Φ·s + ν, where ν represents an additive noise vector. Since Φ·s + ν may not belong to the range space of Φover ksparse vectors, the ℓ_{1}minimization of (25) is modified as follows:
where ε^{2}is the maximum noise power. Let us denote the result of the above minimization for
y=Φ·s + ν by
This shows that small perturbations in the measurements cause small perturbations in the output of the ℓ_{1}minimization method (robustness).
Combinatorial
Another standard approach for reconstruction of compressed sampling is combinatorial.
As before, without loss of generality Ψ=I. The sampling matrix Φis found using a bipartite graph which consists of binary entries, i.e., entries that
are either 1 or 0. Binary search methods are then used to find an unknown ksparse vector
Information theoretic
A more recent approach is adaptive and information theoretic [101]. In this method, the signal
Sampling with finite rate of innovation
The classical sampling theorem states that
where B is the bandwidth of x(t)with the Nyquist interval T_{s}=1/2B. These uniform samples can be regarded as the degrees of freedom of the signal; i.e., a lowpass signal with bandwidth B has one degree of freedom in each Nyquist interval T_{s}. Replacing the sinc function with other kernels in (27), we can generalize the sparsity (bandlimitedness) in the Fourier domain to a wider class of signals known as the shift invariant (SI) spaces:
Similarly, the above signals have one degree of freedom in each T_{s} period of time (the coefficients c_{i}). A more general definition for the degree of freedom is introduced in [3] and is named the Rate of Innovation. For a given signal model, if we denote the degree of freedom in the time interval
of [t_{1},t_{2}] by C_{x}(t_{1},t_{2}), the local rate of innovation is defined by
provided that the limit exists; in this case, we say that the signal has finite rate of innovation [3,27,102,103]. As an example, for the lowpass signals with bandwidth B we have ρ=2B, which is the same as the Nyquist rate. In fact by proper choice of the sampling process, we are extracting the innovations of the signal. Now the question that arises is whether the uniform sampling theorems can be generalized to the signals with finite rate of innovation. Answer is positive for a class of nonbandlimited signals including the SI spaces. Consider the following signals:
where
where {t_{i}} is assumed to be an increasing sequence. For this case, since there are k unknown time instants and k unknown coefficients, we have C_{x}(t_{1},t_{k})=2k. We intend to show that the samples generated by proper sampling kernels φ(t)can be used to reconstruct the sparse Dirac functions. In fact, we choose the kernel φ(t) to satisfy the so called StrangFix condition of order 2k:
The above condition for the Fourier domain becomes
where Φ(Ω) denotes the Fourier transform of φ(t), and the superscript (r) represents the r^{th} derivative. It is also shown that such functions are of the form φ(t)=f(t)∗β_{2k}(t), where β_{2k}(t) is the Bspline of order 2k^{th}and f(t)is an arbitrary function with nonzero DC frequency [102]. Therefore, the function β_{2k}(t) is itself among the possible options for the choice of φ(t).
We can show that for the sampling kernels which satisfy the StrangFix condition (32), the innovations of the signal x(t) (31) can be extracted from the samples (y [j]):
Thus,
In other words, we have filtered the discrete samples (y [j]) in order to obtain the values τ_{r}; (35) shows that these values are only a function of the innovation parameters (amplitudes c_{i} and time instants t_{i}). However, the values τ_{r}are nonlinearly related to the time instants and therefore, the innovations cannot be extracted from τ_{r} using linear algebra.^{d}However, these nonlinear equations form a wellknown system which was studied by Prony in the field of spectral estimation (see Section 5.1) and its discrete version is also employed in both real and Galois field versions of ReedSolomon codes (see Section 4.1). This method which is called the annihilating filter is as follows:
The sequence {τ_{r}}can be viewed as the solution of a recursive equation. In fact if we define
In order to find the time instants t_{i}, we find the polynomial H(z) (or the coefficients h_{i}) and we look for its roots. A recursive relation for τ_{r}becomes
By solving the above linear system of equations, we obtain coefficients h_{i} (for a discussion on invertibility of the left side matrix see [102,104]) and consequently, by finding the roots of H(z), the time instants will be revealed. It should be mentioned that the choice of
This reconstruction method can be used for other types of signals satisfying (30) such as the signals represented by piecewise polynomials [102] (for large enough n, the n^{th}derivative of these signals become delta functions). An important issue in nonlinear reconstruction is the noise analysis; for the purpose of denoising and performance under additive noise the reader is encouraged to see [27].
A nice application of sampling theory and the concept of sparsity is error correction codes for real and complex numbers [105]. In the next section, we shall see that similar methods can be employed for decoding block and convolutional codes.
Error correction codes: Galois and real/complex fields
The relation between sampling and channel coding is the result of the fact that oversampling creates redundancy [105]. This redundancy can be used to correct for “sparse” impulsive noise. Normally, the channel encoding is performed in finite Galois fields as opposed to real/complex fields; the reason is the simplicity of logic circuit implementation and insensitivity to the pattern of errors. On the other hand, the real/complex field implementation of error correction codes has stability problems with respect to the pattern of impulsive, quantization and additive noise [52,59,74,106109]. Nevertheless, such implementation has found applications in fault tolerant computer systems [110114] and impulsive noise removal from 1D and 2D signals [31,32]. Similar to finite Galois fields, real/complex field codes can be implemented in both block and convolutional fashions.
A discrete realfield block code is an oversampled signal with n samples such that, in the transform domain (e.g., DFT), a contiguous number of highfrequency components are zero. In general, the zeros do not have to be the highfrequency components or contiguous. However, if they are contiguous, the resultant m equations (from the syndrome information domain) and m unknown erasures form a Vandermonde matrix, which ensures invertibility and consequently erasure recovery. The DFT block codes are thus a special case of ReedSolomon (RS) codes in the field of real/complex numbers [105].
Figure 12 represents convolutional encoders of rate 1/2of finite constraint length [105] and infinite precision per symbol. Figure 12a is a systematic convolutional encoder and resembles an oversampled signal discussed in Section 3 if the FIR filter acts as an ideal interpolating filter. Figure 12b is a nonsystematic encoder used in the simulations to be discussed subsequently. In the case of additive impulsive noise, errors could be detected based on the side information that there are frequency gaps in the original oversampled signal (syndrome). In the following subsections, various algorithms for decoding along with simulation results are given for both block and convolutional codes. Some of these algorithms can be used in other applications such as spectral and channel estimation.
Figure 12. Convolutional encoders. (a) A realfield systematic convolutional encoder of rate
Decoding of block codes—ELP method
Iterative reconstruction for an erasure channel is identical to the missing sampling
problem [115] discussed in Section 3.1.1 and therefore, will not be discussed here. Let us assume
that we have a finite discrete signal x_{orig}[i], where
where h_{k}s are the ELP coefficients as defined in (36) and Appendix Appendix 1, r is a member of the complement of Λ, and the index additions are in mod(n). After finding E [j] values, the spectrum of the recovered oversampled signal X [j]can be found by removing E [j]from the received signal (see (99) in Appendix Appendix 1). Hence the original signal can be recovered by removing the inserted zeros at the syndrome positions of X [j]. The above algorithm, called the ELP algorithm, is capable of correcting any combination of erasures. However, if the erasures are bursty, the above algorithm may become unstable. To combat bursty erasures, we can use the Sorted DFT (SDFT^{f}) [1,59,116,117] instead of the conventional DFT. The simulation results for block codes with erasure and impulsive noise channels are given in the following two subsections.
Simulation results for erasure channels
The simulation results for the ELP decoding implementation for n=32, p=16, and k=16 erasures (a burst of 16 consecutive missing samples from position 1 to 16) are shown in Figure 13; this figure shows we can have perfect reconstruction up to the capacity of the code (up to the finite computer precision which is above 320 dB; this is also true for Figures 14 and 15). By capacity we mean the maximum number of erasures that a code is capable of correcting.
Figure 13. Recovery of a burst of 16 sample losses.
Figure 14. Simulation results of a convolutional decoder, using the iterative method with the generator matrix, after 30 CG iterations (see [72]); SNR versus the relative rate of erasures (w.r.t. full capacity) in an erasure channel.
Figure 15. Simulation results by using the IMAT method for detecting the location and amplitude of the impulsive noise, λ=1.9.
Since consecutive sample losses represent the worst case [59,116], the proposed method works better for random samples. In practice, the error recovery capability of this technique degrades with the increase of the block and/or burst size due to the accumulation of roundoff errors. In order to reduce the roundoff error, instead of the DFT, a transform based on the SDFT, or Sorted DCT (SDCT) can be used [1,59,116]. These types of transformations act as an interleaver to break down the bursty erasures.
Simulation results for random impulsive noise channel
There are several methods to determine the number, locations, and values of the impulsive noise samples, namely Modified BerlekampMassey for real fields [118,119], ELP, IMAT, and constant false alarm rate with recursive detection estimation (CFARRDE). The BerlekampMassey method for real numbers is sensitive to noise and will not be discussed here [118]. The other methods are discussed below.
ELP method [104]
When the number and positions of the impulsive noise samples are not known, h_{t} in (38) is not known for any t; therefore, we assume the maximum possible number of impulsive noise samples per
block, i.e.,
CFARRDE and IMAT methods [31]
The CFARRDE method is similar to the IMAT with the additional inclusion of the CFAR module to estimate the impulsive noise; CFAR is extensively used in radars to detect and remove clutter noise from data. In CFAR, we compare the noisy signal with its neighbors and determine if an impulsive (sparse) noise is present or not (using soft decision [31]).^{g} After removing the impulsive noise in a “soft” fashion, we estimate the signal using the iterative method for an erasure channel as described in Section 3.1.1 for random sampling or using the ELP method. The impulsive noise and signal detection and estimation go through several iterations in a recursive fashion as shown in Figure 16. As the number of recursions increases, the certainty about the detection of impulsive noise locations also increases; thus, the soft decision is designed to act more like the hard decision during the later parts of the iteration steps, which yields the error locations. Meanwhile, further iterations are performed to enhance the quality of the original signal since suppression of the impulsive noise also suppresses the original signal samples at the location of the impulsive noise. The improvement of using CFARRDE over a simple soft decision RDE is shown in Figure 17.
Figure 16. CFARRDE method with the use of adaptive soft thresholding and an iterative method for signal reconstruction.
Figure 17. Comparison of CFARRDE and a simple soft decision RDE for DFT block codes.
Decoding for convolutional codes
The performance of convolutional decoders depends on the coding rate, the number and values of FIR taps for the encoders, and the type of the decoder. Our simulation results are based on the structure given in Figure 12b, and the taps of the encoder are
The input signal is taken from a uniform random distribution of size 50 and the simulations are run 1,000 times and then averaged. The following subsections describe the simulation results for erasure and impulsive noise channels.
Decoding for erasure channels
For the erasure channels, we derive the generator matrix of a convolutional encoder (Figure 12b with taps given in (39)) as shown below [4]
An iterative decoding scheme for this matrix representation is similar to that of Figure 7 except that the operator G consists of the generator matrix, a mask (erasure operation), and the transpose of the generator matrix. If the rate of erasure does not exceed the encoder full capacity, the matrix form of the operator G can be shown to be a nonnegative definite square matrix and therefore its inverse exists [51,60].
Figure 14 shows that the SNR values gradually decrease as the rate of erasure reaches its maximum (capacity).
Decoding for impulsive noise channels
Let us consider xand y as the input and the output streams of the encoder, respectively, related to each other through the generator matrix G as y=Gx.
Denoting the observation vector at the receiver by
Multiplying the resultant by the right pseudoinverse of the H^{T}, we derive
Thus by multiplying the received vector by H(H^{T}H)^{−1}H^{T}(projection matrix into the range space of H), we obtain an approximation of the impulsive noise. In the IMAT method, we apply the operator H(H^{T}H)^{−1}H^{T} in the iteration of Figure 9; the threshold level is reduced exponentially at each iteration step. The block diagram of IMAT in Figure 9 is modified as shown in Figure 18.
Figure 18. The modified diagram of the IMAT method from Figure 9.
For simulation results, we use the generator matrix shown in (40), which can be calculated from [4].
In our simulations, the locations of the impulsive noise samples are generated randomly and their amplitudes have Gaussian distributions with zero mean and variance equal to 1, 2, 5, and 10 times the variance of the encoder output. The results are shown in Figure 15 after 300 iterations. This figure shows that the high variance impulsive noise has a better performance.
Spectral estimation
In this section, we review some of the methods which are used to evaluate the frequency content of data [710]. In the field of signal spectrum estimation, there are several methods which are appropriate for different types of signals. Some methods are more suitable to estimate the spectrum of wideband signals, whereas some others are better for the extraction of narrowband components. Since our focus is on sparse signals, it would be reasonable to assume sparsity in the frequency domain, i.e., we assume the signal to be a combination of several sinusoids plus white noise.
Conventional methods for spectrum analysis are nonparametric methods in the sense
that they do not assume any model (statistical or deterministic) for the data, except
that it is zero or periodic outside the observation interval. For example, the periodogram
where m is the number of observations, T_{s} is the sampling interval (usually assumed as unity), and x_{r}is the signal. Although nonparametric methods are robust with low computational complexity, they suffer from fundamental limitations. The most important limitation is their resolution; too closely spaced harmonics cannot be distinguished if the spacing is smaller than the inverse of the observation period.
To overcome this resolution problem, parametric methods are devised. Assuming a statistical model with some unknown parameters, we can increase resolution by estimating the parameters from the data at the cost of more computational complexity. Theoretically, in parametric methods, we can resolve closely spaced harmonics with limited data length if the SNR goes to infinity.^{h}
In this section, we shall discuss three parametric approaches for spectral estimation: the Pisarenko, the Prony, and the MUSIC algorithms. The first two are mainly used in spectral estimation, while the MUSIC algorithm was first developed for array processing and later has been extended to spectral estimation. It should be noted that the parametric methods unlike the nonparametric approaches require prior knowledge of the model order (the number of tones). This can be decided from the data using the minimum discription length (MDL) method discussed in the next section.
Prony method
The Prony method was originally proposed for modeling the expansion of gases [120]; however, now it is known as a general spectral estimation method. In fact, Prony tried to fit a weighted mixture of k damped complex exponentials to 2k data measurements. The original approach is related to the noiseless measurements; however, it has been extended to produce the least squared solutions for noisy measurements. We focus only on the noiseless case here. The signal is modeled as a weighted mixture of k complex exponentials with complex amplitudes and frequencies:
where x_{r} is the noiseless discrete sparse signal consisting of k exponentials with parameters
where a_{i},θ_{i},f_{i} represent the amplitude, phase, and the frequency (f_{i}is a complex number in general), respectively. Let us define the polynomial H(z) such that its roots represent the complex exponential functions related to the sparse tones (see Section 3.3 on FRI, (38) on ELP and Appendix 1):
By shifting the index of (44) and multiplying by the parameter h_{j} and summing over j we get
where r is indexed in the range k + 1≤ r ≤ 2 k. This formula implies a recursive equation to solve for h_{i}s [8]. After the evaluation of the h_{i}s, the roots of (46) yield the frequency components. Hence, the amplitudes of the exponentials can be evaluated from a set of linear equations given in (44). The basic Prony algorithm is given in Table 13.
Table 13. Basic prony algorithm
The Prony method is sensitive to noise, which was also observed in the ELP and the annihilating filter methods discussed in Sections 3.3 and 4.1. There are extended Prony methods that are better suited for noisy measurements [10].
Pisarenko harmonic decomposition (PHD)
The PHD method is based on the polynomial of the Prony method and utilizes the eigendecomposition of the data covariance matrix [10]. Assume k complex tones are present in the spectrum of the signal. Then, decompose the covariance matrix of k + 1dimensions into a kdimensional signal subspace and a 1dimensional noise subspace that are orthogonal to each other. By including the additive noise, the observations are given by
where y is the observation sample and νis a zeromean noise term that satisfies E{ν_{r}ν_{r + i}}=σ^{2}δ [i]. By replacing x_{r}=y_{r}−ν_{r}in the difference equation (47), we get
which reveals the autoregressive moving average (ARMA) structure (order (k,k)) of the observations y_{r}as a random process. To benefit from the tools in linear algebra, let us define the following vectors:
Now (49) can be written as
Multiplying both sides of (51) by yand taking the expected value, we get E{yy^{H}}h=E{yν^{H}}h. Note that
We thus have an eigenequation
which is the key equation of the Pisarenko method. The eigenequation of (54) states that the elements of the eigenvector of the covariance matrix, corresponding to the smallest eigenvalue (σ^{2}), are the same as the coefficients in the recursive equation of x_{r}(coefficients of the ARMA model in (49)). Therefore, by evaluating the roots of the polynomial represented in (46) with coefficients that are the elements of this vector, we can find the tones in the spectrum.
Although we started by eigendecomposition of R_{yy}, we observed that only one of the eigenvectors is required; the one that corresponds to the smallest eigenvalue. This eigenvector can be found using simple approaches (in contrast to eigendecomposition) such as power method. The PHD method is briefly shown in Table 14.
Table 14. PHD algorithm
A different formulation of the PHD method with linear programming approach (refer to Section 2.2 for description of linear programming) for array processing is studied in [121]. The PHD method is shown to be equivalent to a geometrical projection problem which can be solved using ℓ_{1}norm optimization.
MUSIC
MUltiple SIgnal Classification (MUSIC), is a method originally devised for highresolution source direction estimation in the context of array processing that will be discussed in the next section [122]. The inherent equivalence of array processing and time series analysis paves the way for the employment of this method in spectral estimation. MUSIC can be understood as a generalization and improvement of the Pisarenko method. It is known that in the context of array processing, MUSIC can attain the statistical efficiency^{i} in the limit of asymptotically large number of observations [11].
In the PHD method, we construct an autocorrelation matrix of dimension k + 1 under the assumption that its smallest eigenvalue (σ^{2}) belongs to the noise subspace. Then we use the Hermitian property of the covariance matrix to conclude that the noise eigenvector should be orthogonal to the signal eigenvectors. In MUSIC, we extend this method using a noise subspace of dimension greater than one to improve the performance. We also use some kind of averaging over noise eigenvectors to obtain a more reliable signal estimator.
The data model for the sum of exponentials plus noise can be written in the matrix form as
where the length of data is taken as m>kand the elements of Aare
where ν represents the noise vector. Since the frequencies are different, A is of rank k and the first term in (55) forms a kdimensional signal subspace, while the second term is randomly distributed in both signal and noise subspaces; i.e., unlike the first term, it is not confined to a subspace of lower dimension. The correlation matrix of the observations is given by
where the noise is assumed to be white with variance σ^{2}. If we decompose R into its eigenvectors, k eigenvalues corresponding to the kdimensional subspace of the first term of (57) are essentially greater than the remaining
m − k values, σ^{2}, corresponding to the noise subspace; thus, by sorting the eigenvalues, the noise
and signal subspaces can be determined. Assume ω is an arbitrary frequency and
where v_{i}s are eigenvectors of Rcorresponding to the noise subspace.
The k peaks of P_{MU}(ω)are selected as the frequencies of the sparse signal. The determination of the number of frequencies (model order) in MUSIC is based on the MDL and Akaike information criterion (AIC) methods to be discussed in the next section. The MUSIC algorithm is briefly explained in Table 15.
Table 15. MUSIC algorithm
Figure 19 compares the results (in the order of improved performance) for various spectral line estimation methods. The first upper figure shows the original spectral lines, and the four other figures show the results for Prony, PHD, MUSIC, and IMAT methods. We observe that the Prony method (which is similar to ELP and annihilating filter of Section 3.3 and (38)) does not yield good results due to its sensitivity to noise, while the IMAT method is the best. The application of IMAT to spectral estimation is a clear confirmation of our contention that we can apply tools developed in some areas to other areas for better performance.
Figure 19. A comparison of various spectral estimation methods for a sparse mixture of sinusoids (the top figure) using Prony, Pisarenko, MUSIC, and IMAT methods (in the order of improved performance); input SNR is 5dB and 256 time samples are used.
Sparse array processing
There are three types of array processing: 1—estimation of multisource location (MSL) and Direction of Arrival (DOA), 2—sparse array beamforming and design, and 3—sparse sensor networks. The first topic is related to estimating the directions and/or the locations of multiple targets; this problem is very similar to the problem of spectral estimation dealt with in the previous section; the relations among sparsity, spectral estimation, and array processing were discussed in [123,124]. The second topic is related to the design of sparse arrays with some missing and/or random array sensors. The last topic, depending on the type of sparsity, is either similar to the second topic or related to CS of sparse signal fields in a network. In the following, we will only consider the first kind.
Array processing for MSL and DOA estimation
Among the important fields of active research in array processing are MSL and DOA estimation [122,125,126]. In such schemes, a passive or active array of sensors is used to locate the sources of narrowband signals. Some applications may assume farfield sources (e.g., radar signal processing) where the array is only capable of DOA estimation, while other applications (e.g. biomedical imaging systems) assume nearfield sources where the array is capable of locating the sources of radiation. A closely related field of study is spectral estimation due to similar linear statistical models. The stochastic sparse signals pass through a partially known linear transform (e.g., array response or inverse Fourier transform) and are observed in a noisy environment.
In the array processing context, the common temporal frequency of the source signals is known. Spatial sampling of the signal is used to extract the direction of the signal (spatial frequency). As a farfield approximation, the signal wavefronts are assumed to be planar. Consider a signal arriving with angle φ as in Figure 20. Simultaneous sampling of this wavefront on the array will exhibit a phase change of the signal from sensor to sensor. In this way, discrete samples of a complex exponential are obtained, where its frequency can be translated to the direction of the signal source. The response of a uniform linear array (ULA) to a wavefront impinging on the array from direction φis
where d is the interelement spacing of the array, λis the wavelength, and n is the number of sensors in the array. When multiple sources are present, the observed vector is the sum of the response (sweep) vectors and noise. This resembles the spectral estimation problem with the difference that sampling of the array elements is not limited in time. In fact in array processing, an additional degree of freedom (the number of elements) is present; thus, array processing is more general than spectral estimation.
Figure 20. Uniform linear array with element distance d, element length I, and a wave arriving from direction φ.
Two main fields in array processing are MSL and DOA for estimating the source locations and directions, respectively; for both purposes, the angle of arrival (azimuth and elevation) should be estimated while for MSL an extra parameter of range is also needed. The simplest case is the 1D ULA (azimuthonly) for DOA estimation.
For the general case of k sources with angles
where the vector s[i] represents the multisource signals and ν[i]is the white Gaussian noise vector. Source signals and additive noise are assumed to be zeromean and i.i.d. normal processes with covariance matrices P and σ^{2}I, respectively. With these assumptions, the observation vector y[i]will also follow an ndimensional zeromean normal distribution with the covariance matrix
In the field of DOA estimation, extensive research has been accomplished in (1) source enumeration, and (2) DOA estimation methods. Both of the subjects correspond to the determination of parameters k and φ.
Although some methods are proposed for simultaneous detection and estimation of the model statistical characteristics [127], most of the literature is devoted to twostage approaches; first, the number of active sources is detected and then their directions are estimated by techniques such as estimation of signal parameters via rotational invariance techniques (ESPRIT)^{j}[128132]. Usually, the joint detectionestimation methods outperform the twostage approaches with the cost of higher computational complexity. In the following, we will describe Minimum Description Length (MDL) as a powerful tool to detect the number of active sources.
Minimum description length
One of the most successful methods in array processing for source enumeration is the use of the MDL criterion [133]. This technique is very powerful and outperforms its older versions including AIC [134136]. Hence, we confine our discussion to MDL algorithms.
Preliminaries
Minimum description length is an optimum method of finding the model order and parameters for the most compressed representation of the observed data. For the purpose of statistical modeling, the MAP probability or the suboptimal criterion of ML is used; more precisely, conditioned on the observed data, the maximum probability among the possible options is found (hypotheses testing) [137]. When the model parameters are not known, the MAP and ML criteria result in the most complex approach; consider fitting a finite sequence of data to a polynomial of unknown degree [33]:
where
Due to the existence of additive noise, it is quite rational to look for a polynomial with degree less than m which also takes the complexity order into account. In MDL, the idea of how to consider the complexity order is borrowed from information theory: given a specific statistical distribution, we can find an optimum source coding scheme (e.g., Huffman coding) which attains the lowest average code length for the symbols. Furthermore, if p_{s} is the distribution of the source s and q_{s}is another distribution, we have [138]:
where H(s) is the entropy of the signal. This implies that the minimum average code length
is obtained only for the correct source distribution (model parameters); in other
words, the choice of wrong model parameters (distribution function) leads to larger
code lengths. When a particular model with the set of parameters θis assumed for the data a priori, each time a sequence y is received, the parameters should first be estimated. The optimum estimation method
is usually the ML estimator which results in
The first term is the ML term for data encoding, and the second term is a penalty function that inhibits the number of free parameters of the model to become very large.
Example of using MDL in spectral estimation
An example from spectral estimation can help clarify how the MDL method works (for more information refer to the previous section on spectral estimation). The mathematical formulation of the problem is as follows: If there are k (unknown) sinusoids with various frequencies, amplitudes, and phases (3k unknown parameters) observed in a noisy data vector x (sampled at n distinct time slots), the maximum likelihood function for this observed data with additive Gaussian noise is as follows:
here
To find the number of embedded sinusoids in the noisy observed data, it is initially assumed that k=0 and (65) is calculated, then k is increased and by using the grid search, the maximum value of the likelihood for the assumed k is calculated from (66), and this calculated value is then used to compute (65). This procedure should be followed as long as (65) decreases and consequently aborted when it starts to rise. The k minimizing (65) is the k selected by MDL method and hopefully reveals the true number of the sinusoids in the noisy observed data. It is obvious that the sparsity condition, i.e., k<<n, is necessary for the efficient operation of MDL. In addition to the number of sinusoids, MDL has apparently estimated the frequency, amplitude, and phase of the embedded sinusoids. This should make it clear why such methods are called detection–estimation algorithms.
The very same method can be used to find the number, position, and amplitude of an impulsive noise added to a lowpass signal in additive noise. If the samples of the added impulsive noise are statistically independent from each other, the highpass samples of the discrete fourier transform (DFT) of the noisy observed data with impulsive noise should be taken and the same method applied.
MDL source enumeration
In the source enumeration problem, our model is a multivariate Gaussian random process with zero mean and covariance of the type shown in (62), where the number of active sources is unknown. In some enumeration methods (other than MDL), the exact form of (62) is employed which results in high computational complexity. In the conventional MDL method, it is assumed that the model is a covariance matrix with a spherical subspace^{l} of dimension n−k. Suppose the sample covariance matrix is
and assume the ordered eigenvalues of
where tr(.)stands for the trace operator. The ML estimate of signal eigenvalues in R are
In fact, since we know that R has a spherical subspace of dimension n−k, we correct the observed
Now, we calculate −log(p(xR_{ML})); it is easy to show that
which is independent of k and can be omitted in the minimization of (65). Thus, for the first term of (65) we only need the determinant R_{ML}which is the product of the eigenvalues, and the MDL criterion becomes
where κ is the number of free parameters in the distribution. This expression should be computed
for different values of 0≤k≤n−1 and its minimum point should be
where the first term is the likelihood ratio for the sphericity test of the covariance matrix. This likelihood ratio is a function of arithmetic and geometric means of the noise subspace eigenvalues [141]. Figure 21 is an example of MDL performance in determining the number of sources in array processing. It is evident that in low SNRs, the MDL has a strong tendency to underestimate the number of sources, while as SNR increases, it gives a consistent estimate. Also at high SNRs, underestimation is more probable than overestimation.
Figure 21. An MDL example; the vertical axis is the probability of order detection. And the other two axes are the number of sources and the SNR values. The MDL method estimates the number of active sources (which is 2) correctly when the SNR value is relatively high.
Now we compute the number of independent parameters (κ) in the model. Since the noise subspace is spherical, the choice of eigenvectors
in this subspace can accept any arbitrary orthonormal set; i.e., no information is
revealed when these vectors are known. Thus, the set of parameters is
where the last integer 1 can be omitted since it is independent of k.
The twopart MDL, despite its very low computational complexity, is among the most successful methods for source enumeration in array processing. Nonetheless, this method does not reach the best attainable performance for finite number of measurements [142]. The new version of MDL, called onepart or Refined MDL has improved the performance for the cases of finite measurements which has not been applied to the array processing problem [33].
Sparse sensor networks
Wireless sensor networks typically consist of a large number of sensor nodes, spatially distributed over a region of interest, that observe some physical environment including acoustic, seismic, and thermal fields with applications in a wide range of areas such as health care, geographical monitoring, homeland security, and hazard detection. The way sensor networks are used in practical applications can be divided into two general categories:
(1) There exists a central node known as the fusion center (FC) that retrieves relevant field information from the sensor nodes and communication from the sensor nodes to FC generally takes place over a power and bandwidthconstrained wireless channel.
(2) Such a central node does not exist and the nodes take specific decisions based on the information they obtain and exchange among themselves. Issues such as distributed computing and processing are of high importance in such scenarios.
In general, there are three main tasks that should be implemented efficiently in a wireless sensor network: sensing, communication, and processing. The main challenge in design of practical sensor networks is to find an efficient way of jointly performing these tasks, while using the minimum amount of system resources (computation, power, bandwidth) and satisfying the required system design parameters (such as distortion levels). For example, one such metric is the socalled energydistortion tradeoff which determines how much energy the sensor network consumes in extracting and delivering relevant information up to a given distortion level. Although many theoretical results are already available in the case of pointtopoint links in which separation between source and channel coding can be assumed, the problem of efficiently transmitting or sharing information among a vast number of distributed nodes remains a great challenge. This is due to the fact that welldeveloped theories and tools for distributed signal processing, communications, and information theory in largescale networked systems are still under development. However, recent results on distributed estimation or detection indicate that joint optimization through some form of sourcechannel matching and local node cooperation can result in significant system performance improvement [143147].
How sparsity can be exploited in a sensor network
Sparsity appears in many applications for which sensor networks are deployed, e.g., localization of targets in a large region or estimation of physical phenomena such as temperature fields that are sparse under a suitable transformation. For example, in radar applications, under a farfield assumption, the observation system is linear and can be expressed as a matrix of steering vectors [148,149]. In general, sparsity can arise in a sensor network from two main perspectives:
(1) Sparsity of node distribution in spatial terms
(2) Sparsity of the field to be estimated
Although nodes in a sensor network can be assumed to be regularly deployed in a given environment, such an assumption is not valid in many practical scenarios. Therefore, the nonuniform distribution of nodes can lead to some type of sparsity in spatial domain that can be exploited to reduce the amount of sensing, processing, and/or communication. This issue is subsequently related to extensions of the nonuniform sampling techniques to twodimensional domains through proper interpolation and data recovery when samples are spatially sparse [34,150]. The second scenario that provides a proper basis for exploiting the sparsity concepts arises when the field to be estimated is a sparse multidimensional signal. From this point of view, ideas such as those presented earlier in the context of compressed sensing (Section 3.2) provide the proper framework to address the sparsity in such fields.
Spatial sparsity and interpolation in sensor networks
Although general 2D interpolation techniques are wellknown in various branches of statistics and signal processing, the main issue in a sensor network is exploring proper spatio/temporal interpolation such that communication and processing are also efficiently accomplished. While there is a wide range of interpolation schemes (polynomial, Fourier, and least squares [151]), many of these schemes are not directly applicable for spatial interpolation in sensor networks due to their communication complexity.
Another characteristic of many sensor networks is the nonuniformity of node distribution in the measurement field. Although nonuniformity has been dealt with extensively in contexts such as signal processing, geospatial data processing, and computational geometry [1], the combination of irregular sensor data sampling and intranetwork processing is a main challenge in sensor networks. For example, reference [152] addresses the issue of spatiotemporal nonuniformity in sensor networks and how it impacts performance aspects of a sensor network such as compression efficiency and routing overhead. In order to reduce the impact of nonuniformity, the authors in [152] propose using a combination of spatial data interpolation and temporal signal segmentation. A simple interpolation wavelet transform for irregular sampling which is an extension of the 2D irregular grid transform to 3D spatiotemporal transform grids is also proposed in [153]. Such a multiscale transform extends the approach in [154] and removes the dependence on building a distributed mesh within the network. It should be noted that although wavelet compression allows the network to trade reconstruction quality for communication energy and bandwidth usage, such energy savings are naturally offset by the overhead cost of computing the wavelet coefficients.
Distributed wavelet processing within sensor networks is yet another approach to reduce communication energy and wireless bandwidth usage. Use of such distributed processing makes it possible to trade longhaul transmission of raw data to the FC for less costly local communication and processing among neighboring nodes [153]. In addition, local collaboration among nodes decorrelates measurements and results in a sparser data set.
Compressive sensing in sensor networks
Most natural phenomena in SNs are compressible through representation in a natural basis [86]. Some examples of these applications are imaging in a scattering medium [148], MIMO radar [149], and geoexploration via underground seismic data. In such cases, it is possible to construct a highly compressed version of a given field, in a decentralized fashion. If the correlations between data at different nodes are known apriori, it is possible to use schemes that have very favorable powerdistortionlatency tradeoffs [143,155,156]. In such cases, distributed source coding techniques, such as SlepianWolf coding, can be used to design compression schemes without collaboration between nodes (see [155] and the references therein). Since prior knowledge of such correlations is not available in many applications, collaborative, intranetwork processing and compression are used to determine unknown correlations and dependencies through information exchange between network nodes. In this regard, the concept of compressive wireless sensing has been introduced in [147] for energyefficient estimation at the FC of sensor data, based on ideas from wireless communications [143,145,156158] and compressive sampling theory [29,75,159]. The main objective in such an approach is to combine processing and communications in a single distributed operation [160162].
Methods to obtain the required sparsity in a SN
While transformbased compression is welldeveloped in traditional signal and image processing domains, the understanding of sparse transforms for networked data is not as trivial [163]. There are methods such as associating a graph with a given network, where the vertices of the graph represent the nodes of the network, and edges between vertices represent relationships among data at adjacent nodes. The structure of the connectivity is the key to obtaining effective sparse transformations for networked data [163]. For example, in the case of uniformly distributed nodes, tools such as DFT or DCT can be adopted to exploit the sparsity in the frequency domain. In more general settings, wavelet techniques can be extended to handle the irregular distribution of sampling locations [153]. There are also scenarios in which standard signal transforms may not be directly applicable. For example, network monitoring applications rely on the analysis of communication traffic levels at the network nodes where network topology affects the nature of node relationships in complex ways. Graph wavelets [164] and diffusion wavelets [165] are two classes of transforms that have been proposed to address such complexities. In the former case, the wavelet coefficients are obtained by computing the digital differences of the data at different scales. The coefficients at the first scale are differences between neighboring data points, and those at subsequent spatial scales are computed by first aggregating data in neighborhoods and then computing differences between neighboring aggregations. The resulting graph wavelet coefficients are then defined by aggregated data at different scales and computing differences between the aggregated data [164]. In the latter scheme, diffusion wavelets are based on construction of an orthonormal basis for functions supported on a graph and obtaining a customdesigned basis by analyzing eigenvectors of a diffusion matrix derived from the graph adjacency matrix. The resulting basis vectors are generally localized to neighborhoods of varying size and may also lead to sparse representations of data on a graph [165]. One example of such an approach is where the node data correspond to traffic rates of routers in a computer network.
Implementation of CS in a wireless SN
Two main approaches to implement random projections in a SN are discussed in the literature [163]. In the first approach, the CS projections are simultaneously calculated through superposition of radio waves and communicated using amplitudemodulated coherent transmissions of randomly weighted values directly from the nodes in the network to the FC (Figure 22). This scheme, introduced in [147,157] and further refined in [166], is based on the notion of the socalled matched sourcechannel communication[156,157]. Although the need for complex routing, intranetwork communications, and processing are alleviated, local phase synchronization among nodes is an issue to be addressed properly in this approach.
Figure 22. Computation of CS projections through superposition of radio waves of randomly weighted values directly from the nodes in the network to the FC (from [163]).
In the second approach, the projections can be computed and delivered to every subset of nodes in the network using gossip/consensus techniques, or be delivered to a single point using clustering and aggregation. This approach is typically used for networked data storage and retrieval applications. In this method, computation and distribution of each CS sample is accomplished through two simple steps [163]. In the first step, each of the sensors multiplies its data with the corresponding element of the compressing matrix. Then, in the second step, the resulting local terms are simultaneously aggregated and distributed across the network using randomized gossip [167], which is a simple iterative decentralized algorithm for computing linear functions. Because each node only exchanges information with its immediate neighbors in the network, gossip algorithms are more robust to failures or changes in the network topology and cannot be easily compromised by eliminating a single server or fusion center [168].
Finally, it should be noted that in addition to the encoding process, the overall system performance is significantly affected by the decoding process [44,88,169]; this study and its extensions to sparse SNs remain as challenging tasks.
Sensing capacity
Despite widespread development of SN ideas in recent years, understanding of fundamental performance limits of sensing and communication between sensors is still under development. One of the issues that has recently attracted attention in theoretical analysis of sensor networks is the concept of sensor capacity. The sensing capacity was initially introduced for discrete alphabets in applications such as target detection [170] and later extended in [14,171,172] to the continuous case. The questions in this area are related to the problem of sampling of sparse signals, [29,76,159] and sampling with finite rate of innovation [3,103]. In the context of the CS, sensing capacity provides bounds on the maximum signal dimension or complexity per sensor measurement that can be recovered to a predefined degree of accuracy. Alternatively, it can be interpreted as the minimum number of sensors necessary to monitor a given region to a desired degree of fidelity based on noisy sensor measurements. The inverse of sensing capacity is the compression rate; i.e., the ratio of the number of measurements to the number of signal dimensions which characterizes the minimum rate to which the source can be compressed. As shown in [14], sensing capacity is a function of SNR, the inherent dimensionality of the information space, sensing diversity, and the desired distortion level.
Another issue to be noted with respect to the sensing capacity is the inherent difference between sensor network and CS scenarios in the way in which the SNR is handled [14,172]. In sensor networks composed of many sensors, fixed SNR can be imposed for each individual sensor. Thus, the sensed SNR per location is spread across the field of view leading to a rowwise normalization of the observation matrix. On the other hand, in CS, the vectorvalued observation corresponding to each signal component is normalized by each column. This difference has led to different regimes of compression rate [172]. In SN, in contrast to the CS setting, sensing capacity is generally small and correspondingly the number of sensors required does not scale linearly with the target sparsity. Specifically, the number of measurements is generally proportional to the signal dimension and is weakly dependent on target density sparsity. This issue has raised questions on compressive gains in powerlimited SN applications based on sparsity of the underlying source domain.
Sparse component analysis: BSS and SDR
Introduction
Recovery of the original source signals from their mixtures, without having a priori information about the sources and the way they are mixed, is called blind source separation (BSS). This process is impossible if no assumption about the sources can be made. Such an assumption on the sources may be uncorrelatedness, statistical independence, lack of mutual information, or disjointness in some space [18,19,49].
The signal mixtures are often decomposed into their constituent principal components, independent components, or are separated based on their disjoint characteristics described in a suitable domain. In the latter case, the original sources should be sparse in that domain. Independent component analysis (ICA) is often used for separation of the sources in the former case, whereas SCA is employed for the latter case. These two mathematical tools are described in the following sections followed by some results and illustrations of their applications.
Independent component analysis (ICA)
The main assumption in ICA is the statistical independence of the constituent sources. Based on this assumption, ICA can play a crucial role in the separation and denoising of signals (BSS).
There has been recent research interest in the field of BSS due to its practicality in a wide range of problems. For example, BSS of acoustic signals measured in a room is often referred to as the Cocktail Party problem, which means separation of individual sounds from a number of recordings in an echoic and noisy environment. Figure 23 illustrates the BSS concept, wherein the mixing block represents the multipath propagation model between the original sources and the microphone measurements.
Figure 23. The BSS concept; the unobservable sources s_{1}[i],…,s_{n}[i] are mixed and corrupted by additive zero mean noise to generate the observations
x_{1}[i],…,x_{m}[i]. The target of BSS is to estimate an unmixing system to recover the original sources
in
Generally, BSS algorithms make assumptions about the environment in order to make the problem more tractable. There are typically three assumptions about the mixing medium. The most simple but widely used case is the instantaneous case, where the source signals arrive at the sensors at the same time. This has been considered for separation of biological signals such as the EEG where the signals have narrow bandwidths and the sampling frequency is normally low [173]. The generative model for BSS in this case can be easily formulated as
where s[i], x[i], and ν[i] denote, respectively, the vector of source signals, size n×1, observed signal size m×1, and noise signal size m×1. H is the mixing matrix of size m×n. Generally, the mixing process can be nonlinear (due to inhomogenity of the environment and that the medium can change with respect to the source signal variations; e.g., stronger vibration of a drum as a medium, with louder sound). However, in an instantaneous linear case where the above problems can be avoided or ignored, the separation is performed by means of a separating matrix, W of size n×m, which uses only the information contained in x[i]to reconstruct the original source signals (or the independent components) as
where y[i] is the estimate for the source signal s [i]. The early approaches in instantaneous BSS started from the work by Herault and Jutten [174] in 1986. In their approach, they considered nonGaussian sources with equal number of independent sources and mixtures. They proposed a solution based on a recurrent artificial neural network for separation of the sources.
In the cases where the number of sources is known, any ambiguity caused by false estimation of the number of sources can be avoided. If the number of sources is unknown, a criterion may be established to estimate the number of sources beforehand. In the context of model identification, this is referred to as Model Order Selection and methods such as the final prediction error (FPE), AIC, residual variance (RV), MDL and Hannan and Quinn (HNQ) methods [175] may be considered to solve this problem.
In acoustic applications, however, there are usually time lags between the arrival times of the signals at the sensors. The signals also may arrive through multiple paths. This type of mixing model is called a convolutive model [176]. The convolutive mixing model can also be classified into two subcategories: anechoic and echoic. In both cases, the vector representations of mixing and separating processes are modified as x[i]=H[i]∗s[i] + ν[i] and y[i]=W[i]∗x[i], respectively, where ∗denotes the convolution operation. In an anechoic model, however, the expansion of the mixing process may be given as
where the attenuation, h_{r,j}, and delay δ_{r,j} of source j to sensor r would be determined by the physical position of the source relative to the sensors. Then the unmixing process to estimate the sources will be given as
where the w_{j,r}s are the elements of W. In an echoic mixing environment, it is expected that the signals from the same sources reach the sensors through multiple paths. Therefore, the expansion of the mixing and separating models will be changed to
where L denotes the maximum number of paths for the sources, ν_{r}[i] is the accumulated noise at sensor r, and (.)^{l}refers to the l^{th}path. The unmixing process will be formulated similarly to the anechoic one. For a known number of sources, an accurate result may be expected if the number of paths is known; otherwise, the overall number of observations in an echoic case is infinite.
The aim of BSS using ICA is to estimate an unmixing matrix W such that Y=WX best approximates the independent sources s, where y and x are respectively matrices with columns
Mutual information is a measure of independence and maximizing the nonGaussianity of the source signals is equivalent to minimizing the mutual information between them [177].
In those cases where the number of sources is more than the number of mixtures (underdetermined systems), the above BSS schemes cannot be applied simply because the mixing matrix is not invertible, and generally the original sources cannot be extracted. However, when the signals are sparse, the methods based on disjointness of the sources in some domain may be utilized. Separation of the mixtures of sparse signals is potentially possible in the situation where, at each sample instant, the number of nonzero sources is not more than a fraction of the number of sensors (see Table 1, row and column 6). The mixtures of sparse signals can also be instantaneous or convolutive.
Sparse component analysis (SCA)
While the independence assumption for the sources is widely exploited in the design of BSS algorithms, the possible disjointness of the sources in some domain has not been considered. In SCA, this property is directly employed. Blind source separation by sparse decomposition has been addressed by Zibulevsky and Pearlmutter [178] for both overdetermined/exactlydetermined and underdetermined systems using the maximum a posteriori approach. One way of formulating SCA is by representing the sources using a proper signal dictionary:
where
where ν[i] is an m×1vector. Aand Ccan be determined by optimization of a cost function based on an exponential distribution for c_{i,j}[178]. In places where the sources are sparse and at each time instant, at most one of the sources has significant nonzero value, the columns of the mixing matrix may be calculated individually, which makes the solution to the underdetermined case possible.
The SCA problem can be stated as a clustering problem since the lines in the scatter
plot can be separated based on their directionalities by means of clustering. A number
of works on this method have been reported [18,179,180]. In the work by Li et al. [180], the separation has been performed in two different stages. First, the unknown mixing
matrix is estimated using the kmeans clustering method. Then, the source matrix is estimated using a standard linear
programming algorithm. The line orientation of a data set may be thought of as the
direction of its greatest variance. One way is to perform eigenvector decomposition
on the covariance matrix of the data, the resultant principal eigenvector, i.e., the
eigenvector with the largest eigenvalue, indicates the direction of the data, since
it has the maximum variance. In [179], GAP statistics as a metric which measures the distance between the total variance
and cluster variances, has been used to estimate the number of sources followed by
a similar method to Li’s algorithm explained above. In line with this approach, Bofill
and Zibulevsky [15] developed a potential function method for estimating the mixing matrix followed by
ℓ_{1}norm decomposition for the source estimation. Local maxima of the potential function
correspond to the estimated directions of the basis vectors. After the mixing matrix
is identified, the sources have to be estimated. Even when Ais known, the solution is not unique. So, a solution is found for which the ℓ_{1}norm is minimized. Therefore, for
Geometrically, for a given feasible solution, each source component is a segment of
length s_{j} in the direction of the corresponding a_{j}and, by concatenation, their sum defines a path from the origin to x [i]. Minimizing
Figure 24. Measurement points for data structures consisting of multiple lower dimensional subspaces. (a) the scatter plot and (b) the shortest path from the origin to the data point, x [i], extracted from [15].
There are many cases for which the sources are disjoint in other domains, rather than the timedomain, or when they can be represented as sum of the members of a dictionary which can consist for example of wavelets or wavelet packets. In these cases the SCA can be performed in those domains more efficiently. Such methods often include transformation to timefrequency domain followed by a binary masking [181] or a BSS followed by binary masking [176]. One such approach, called degenerate unmixing estimation technique (DUET) [181], transforms the anechoic convolutive observations into the timefrequency domain using a shorttime Fourier transform and the relative attenuation and delay values between the two observations are calculated from the ratio of corresponding timefrequency points. The regions of significant amplitudes (atoms) are then considered to be the source components in the timefrequency domain. In this method only two mixtures have been considered and as a major limit of this method, only one source has been considered active at each time instant.
For instantaneous separation of sparse sources, the common approach used by most researchers is to attempt to maximize the sparsity of the extracted signals at the output of the separator. The columns of the mixing matrix A assign each observed data point to only one source based on some measure of proximity to those columns [182], i.e., at each instant only one source is considered active. Therefore the mixing system can be presented as:
where in an ideal case, a_{j,r}= 0 for r≠j. Minimization of the ℓ_{1}norm is one of the most logical methods for estimation of the sources as long as
the signals can be considered sparse. ℓ_{1}norm minimization is a piecewise linear operation that partially assigns the energy
of x [i] to the m columns of A around x [i] in
A detailed discussion of signal recovery using ℓ_{1}norm minimization is presented by Takigawa et al. [183] and described below. As mentioned above, it is important to choose a domain that sparsely represents the signals.
On the other hand, in the method developed by Pedersen et al. [176], as applied to stereo signals, the binary masks are estimated after BSS of the mixtures and then applied to the microphone signals. The same technique has been used for convolutive sparse mixtures after the signals are transformed to the frequency domain.
In another approach [184], the effect of outlier noise has been reduced using median filtering then hybrid fast ICA filtering, and ℓ_{1}norm minimization have been used for separation of temporomandibular joint sounds. It has been shown that for such sources, this method outperforms both DUET and Li’s algorithms. The authors of [185] have recently extended the DUET algorithm to separation of more than two sources in an echoic mixing scenario in the timefrequency domain.
In a very recent approach, it has been considered that brain signal sources in the spacetime frequency domain are disjoint. Therefore, clustering the observation points in the spacetimefrequencydomain can be effectively used for separation of brain sources [186].
As it can be seen, generally, BSS exploits independence of the source signals, whereas SCA benefits from the disjointness property of the source signals in some domain. While the BSS algorithms mostly rely on ICA with statistical properties of the signals, SCA uses their geometrical and behavioral properties. Therefore, in SCA, either a clustering approach or a masking procedure can result in estimation of the mixing matrix. Often, an ℓ_{1}norm is used to recover the source signals. Generally, in places where the source signals are sparse, the SCA methods often result in more accurate estimation of the signals with less ambiguities in the estimation.
SCA algorithms
There are three main steps for the solution of an SCA problem as shown in Table 16[187]. The first step of Table 16 shows a linear model for the SCA problem, the second step consists of estimating the mixing matrix A using sparsity information, and finally the third step is to estimate the sparse source representation based on the estimate of A[17].
Table 16. SCA steps
A brief review of major approaches that are suggested for the third step was given in Section 2.
Sparse dictionary representation (SDR) and signal modeling
A signal
Find subspaces
over all possible choices of l subspaces with dimension of V_{i} ≤ k < n. Here d denotes the Euclidian distance in
Figure 25. Objective function. (a)e = d^{2}(f_{1}, V_{2}) + d^{2}(f_{2},V_{1}) + d^{2}(f_{3}, V_{1}) and (b)e = d^{2}(f_{1}, V_{2}) + d^{2}(f_{3}, V_{2}) + d^{2}(f_{2}, V_{1}). Configuration of V_{1},V_{2} in a) creates the partition P_{1} ={f_{1}} and P_{2} = {f_{2},f_{3}} while the configuration in (b) causes the partition P_{1} = {f_{1}, f_{3}} and P_{2} = {f_{2}}.
For the special finite dimensional case of
Table 17. Search algorithm
In some new attempts sparse representation and the compressive sensing concept have been extended to solving multichannel source separation [191194]. In [191,192] separation of sparse sources with different morphologies has been presented by developing a multichannel morphological component analysis approach. In this scheme, the signals are considered as combination of features from different dictionaries. Therefore, different dictionaries are assumed for different sources. In [193] inversion of a random field from pointwise measurements collected by a sensor network is presented. In this article, it is assumed that the field has a sparse representation in a known basis. To illustrate the approach, the inversion of an acoustic field created by the superposition of a discrete number of propagating noisy acoustic sources is considered. The method combines compressed sensing (sparse reconstruction by ℓ_{1}constrained optimization) with distributed average consensus (mixing the pointwise sensor measurements by local communication among the sensors). [194] addresses source separation from a linear mixture under source sparsity and orthogonality of the mixing matrix assumptions. A twostage separation process is proposed. In the first stage recovering a sparsity pattern of the sources is tried by exploiting the orthogonality prior. In the second stage, the support is used to reformulate the recovery task as an optimization problem. Then a solution based on alternating minimization for solving the above problems is suggested.
Multipath channel estimation
In wireless systems, channel estimation is required for the compensation of channel distortions. The transmitted signal reflects off different objects and arrives at the receiver from multiple paths. This phenomenon causes the received signal to be a mixture of reflected and scattered versions of the transmitted signal. The mobility of the transmitter, receiver, and scattering objects results in rapid changes in the channel response, and thus the channel estimation process becomes more complicated. Due to the sparse distribution of scattering objects, a multipath channel is sparse in the time domain as shown in Figure 26. By taking sparsity into consideration, channel estimation can be simplified and/or made more accurate. The sparse time varying multipath channel is modeled as
where k is the number of taps, α_{l}is the l^{th} complex path gain, and τ_{l} is the corresponding path delay. At time t, the transfer function is given by
Figure 26. The impulse response of two typical multipath channels. (a) BrazilD and (b) TU6 channel profiles.
The estimation of the multipath channel impulse response is very much similar to the
determination of analog epochs and amplitudes of discontinuities for finite rate of
innovation as shown in (31). Essentially, if a known train of impulses is transmitted
and the received signal from the multipath channel is filtered and sampled (information
domain as discussed in Section 3.3), the channel impulse response can be estimated
from these samples using an annihilating filter (the Prony or ELP method) [27] defined with the
A special case of multipath channel is an OFDM channel, which is widely used in ADSL,
DAB, DVB, WLAN, WMAN, and WIMAX.^{n}OFDM is a digital multicarrier transmission technique where a single data stream
is transmitted over several subcarrier frequencies to achieve robustness against
multipath channels as well as spectral efficiency [195]. Channel estimation for OFDM is relatively simple; the time instances of channel
impulse response is now quantized and instead of an annihilating filter defined in
the
OFDM channel estimation
For OFDM, the discrete version of the time varying channel of (85) in the frequency domain becomes
where
where T_{f}and n are the symbol length (including cyclic prefix) and number of subcarriers in each
OFDM symbol, respectively. Δf is the subcarrier spacing, and
Two major methods are used in the equalization process [196]: (1) zero forcing and (2) minimun mean squared error (MMSE). In the zero forcing
method, regardless of the noise variance, equalization is obtained by dividing the
received OFDM symbol by the estimated channel frequency response; while in the MMSE
method, the approximation is chosen such that the MSE of the transmitted data vector
Statement of the problem
The goal of the channel estimation process is to obtain the channel impulse response from the noisy values of the channel transfer function in the pilot positions. This is equivalent to solving the following equation for H.
where i_{p}is an index vector denoting the pilot positions in the frequency spectrum,
Sparse OFDM channel estimation
In the following, we present two methods that utilize this sparsity to enhance the channel estimation process.
CSbased channel estimation
The idea of using timedomain sparsity in OFDM channel estimation has been proposed by [203205]. There are two main advantages in including the sparsity constraint of the channel impulse response in the estimation process:
(1) Decrease in the MSE: By applying the sparsity constraint, the energy of the estimated channel impulse response will be concentrated into a few coefficients while in the conventional methods, we usually observe a leakage of the energy to the neighboring coefficients of the nonzero taps. Thus, if the sparsitybased methods succeed in estimating the support of the channel impulse response, the MSE will be improved by prevention of the leakage effect.
(2) Reduction in the overhead: The number of pilot subcarriers is in fact, the number of (noisy) samples that we obtain from the channel frequency response. Since the pilot subcarriers do not convey any data, they are considered as the overhead imposed to enhance the estimation process. The theoretical results in [203] indicate that by means of sparsitybased methods, the perfect estimation can be achieved with an overhead proportional to the number of nonzero channel taps (which is considerably less than that of the current standards).
In the sequel, we present two iterative methods which exploit the inherent sparsity of the channel impulse response to improve the channel estimation task in OFDM systems.
Iterative method with adaptive thresholding (IMAT) for OFDM channel estimation [206]
Here we apply a similar iterative method as in Section 4.2 for the channel estimation
problem in (88). The main goal is to estimate Hfrom
where we used
The nonzero coefficients of Hare found through a set of iterations followed by adaptively decreasing thresholds:
where λ and i are the relaxation parameter and the iteration number, respectively, k is the index of channel impulse response and
Figure 27. Block diagram of the IMAT method.
Modified IMAT (MIMAT) for OFDM channel estimation [23]
In this method, the spectrum of the channel is initially estimated using a simple interpolation method such as linear interpolation between pilot subcarriers. This initial estimate is further improved in a series of iterations between time (sparse) and frequency (information) domains to find the sparsest channel impulse response by using an adaptive thresholding scheme; in each iteration, after finding the locations of the taps (locations with previously estimated amplitudes higher than the threshold), their respective amplitudes are again found using the MMSE criterion. In each iteration, due to thresholding, some of the false taps that are noise samples with amplitudes above the threshold are discarded. Thus, the new iteration starts with a lower number of false taps. Moreover, because of the MMSE estimator, the valid taps approach their actual values in each new iteration. In the last iteration, the actual taps are detected and the MMSE estimator gives their respective values. This method is similar to RDE and IDE methods discussed in Sections 2.6 and 4.1.2. The main advantage of this method is its robustness against sideband zeropadding.^{o}
Table 18 summarizes the steps in the MIMAT algorithm. In the threshold of the MIMAT algorithm, α and βare constants which depend on the number of taps and initial powers of noise and channel impulses. In the first iteration, the threshold is a small number, and with each iteration it is gradually increased. Intuitively, this gradual increase of the threshold with the iteration number, results in a gradual reduction of false taps (taps that are created due to noise). In each iteration, the tap values are obtained from
where t denotes the index of nonzero impulses obtained from the previous step and
Table 18. MIMAT algorithm for OFDM channel estimation
The equation that has to be solved in (93) is usually overdetermined which helps the suppression of the noise in each iteration step. Note that the solution presented in (94) represents a variant of the MMSE solution when the location of discrete impulses are known. If further statistical knowledge is available, this solution can be modified and a better estimation is obtained; however, this makes the approximation process more complex. This algorithm does not need many steps of iterations; the positions of the nonzero impulses are perfectly detected in three or four iterations for most types of channels.
Simulation results and discussions
For OFDM simulations, the DVBH standard was used with the 16QAM constellation in the 2K mode (2^{11} FFT size). The channel profile was the Brazil channel D. Figures 28, 29, 30, and 31 show the symbol error rate (SER) versus the carriertonoise ratio (CNR) after equalizing using different sparse reconstruction methods such as orthogonal matching pursuit (OMP) [88], compressive sampling matching pursuit (CoSaMP) [41], gradient projection for sparse reconstruction (GPSR) [44], IMAT and MIMAT. Also the standard linear interpolation in the frequency domain using the noisy pilot samples is simulated. In these simulations, we have considered the effects of zeropadding and Doppler frequency in the SER of estimation. As can be seen in Figures 28, 29, 30, and 31, the SER obtained from the sparsitybased algorithms reveal almost perfect approximation of the hypothetical ideal channel (where the exact channel frequency response is used for equalization).
Figure 28. SER vs. CNR for the ideal channel, linear interpolation, GPSR, OMP, and the IMAT for the Brazil channel at Fd=0 without zeropadding effect.
Figure 29. SER vs. CNR for the ideal channel, linear interpolation, GPSR, CoSaMP, and the IMAT for the Brazil channel at Fd=50 Hz without zeropadding effect.
Figure 30. SER vs. CNR for the ideal channel, linear interpolation, GPSR, CoSaMP and the MIMAT for the Brazil channel at Fd=0 including zeropadding effect.
Figure 31. SER versus CNR for the ideal channel, linear interpolation, GPSR, OMP, and the MIMAT for the Brazil channel at Fd=50 Hz including zeropadding effect.
Conclusion
A unified view of sparse signal processing has been presented in tutorial form. The sparsity in the key areas of sampling, coding, spectral estimation, array processing, component analysis, and channel estimation has been carefully exploited. Some form of uniform or random sampling has been shown to underpin the associated sparse processing methods used in each of these fields. The reconstruction methods used in each application domain have been introduced and the interconnections among them have been highlighted.
This development has revealed, for example, that the iterative methods developed for random sampling can be applied to realfield block and convolutional channel coding for impulsive noise (saltandpepper noise in the case of images) removal, SCA, and channel estimation for orthogonal frequency division multiplexing systems. These iterative reconstruction methods have been shown to be naturally extendable to spectral estimation and sparse array processing due to their similarity to channel coding in terms of mathematical models with significant improvements. Conversely, the minimum description length method developed for spectral estimation and array processing has potential for application in other areas. The error locator polynomial method developed for channel coding has, moreover, been shown to be a discrete version of the annihilating filter used in sampling with a finite rate of innovation and the Prony method in spectral estimation; the Pisarenko and MUSIC methods are further improvements of the Prony method when additive noise is also considered.
Linkages with emergent areas such as compressive sensing and channel estimation have also been considered. In addition, it has been suggested that the linear programming methods developed for compressive sensing and SCA can be applied to other applications with possible reduction of sampling rate. As such, this tutorial has provided a route for new applications of sparse signal processing to emerge, which can potentially reduce computational complexity and improve performance quality. Other potential applications of sparsity are in the areas of sensor networks and sparse array design.
Endnotes
^{a}Sparse Signal Processing, Panel Session organized and chaired by F. Marvasti and lectured by Profs. E. Candes, R. G. Baraniuk, P. Marziliano, and Dr. A. Cichoki, ICASSP 2008, Las Vegas, May 2008.^{b}A list of acronyms is given in Table 2 at the end of this section.^{c}The sequence of vectors {v_{n}} is called a Riesz basis if there exist scalars 0<A≤B<∞such that for every absolutely summable sequence of scalars {a_{n}}, we have the following inequalities [207]:
^{d}Note that the StrangFix condition can also be used for an exponential polynomial
assuming the delta functions are nonuniformly periodic; in that case τ_{r}in equation (35) is similar toE, the DFT of the impulses, as defined in Appendices Appendix 1 and Appendix 2.^{e}We call the set of indices of consecutive zeros syndrome positions and denote it by
Λ; this set includes the complex conjugate part of the Fourier domain.^{f}The kernel of SDFT is
Appendix 1
ELP decoding for erasure channels [59]
For lost samples, the polynomial locator for the erasure samples is
where
Since the inner summation is the DFT of the missing samples e [i_{m}], we get
where E [.] is the DFT of e [i]. The received samples, d [i], can be thought of as the original oversampled signal, x [i], minus the missing samples e [i_{m}]. The error signal, e [i], is the difference between the corrupted and the original oversampled signal and hence is equal to the values of the missing samples for i = i_{m}and is equal to zero otherwise. In the frequency domain, we have
Since X [j] = 0 for j∈Λ (see the footnote on page 227), then
The remaining values of E [j]can be found from (98), by the following recursion:
where r∉Λ and the index additions are in mod (n).
Appendix 2
ELP decoding for impulsive noise channels [31,104]
For all integer values of r such that r∈Λ and r + k∈Λ, we obtain a system of k equations with k + 1unknowns (h_{t}coefficients). These equations yield a unique solution for the polynomial with the
additional condition that the first nonzero h_{t} is equal to one. After finding the coefficients, we need to determine the roots of
the polynomial in (95). Since the roots of H(z) are of the form
Competing interests
The authors declare that they have no competing interests.
Acknowledgements
We would like to sincerely thank our colleagues for their specific contributions in various sections in this article. Especially, Drs. S. Holm from University of Oslo who contributed a section in sparse array design, M. Nouri Moghadam, the director of the Newton Foundation, H. Saeedi from University of Massachusetts, and K. Mehrany from EE Dept. of Sharif University of Technology who contributed to various sections of the original paper before revision. We are also thankful to M. Valiollahzadeh who edited and contributed to the SCA section. We are especially indebted to Prof. B. Sankur from Bogazici University in Turkey for his careful review and comments. We are also thankful to the students of the Multimedia Lab and members of ACRI at Sharif University of Technology for their invaluable help and simulations. We are specifically indebted to A. Hosseini, A. Rashidinejad, R. Eghbali, A. Kazerouni, V. Montazerhodjat, S. Jafarzadeh, A. Salemi, M. Soltanalian, M. Sharif and H. Firouzi. The work of Akram Aldroubi was supported in part by grant NSFDMS 0807464.
References

F Marvasti, Nonuniform Sampling: Theory and Practice (New York: Springer, 2001)

RG Baraniuk, A lecture on compressive sensing. IEEE Signal Process. Mag 24(4), 118–121 (2007)

M Vetterli, P Marziliano, Blu T, Sampling signals with finite rate of innovation. IEEE Trans. Signal Process 50(6), 1417–1428 (2002)

S Lin, DJ Costello, Error Control Coding (Englewood Cliffs: PrenticeHall, 1983)

T Richardson, R Urbanke, Modern Coding Theory (Cambridge: Cambridge University Press, 2008)

F Marvasti, M Hung, MR Nakhai, The application of walsh transform for forward error correction. in Proc, ed. by . IEEE Int. Conf. Acoust, Speech Signal Proc. ICASSP’99 (USA: Phoenix, AZ, 1999), pp. 2459–2462

SL Marple, Digital Spectral Analysis (Englewood Cliffs: PrenticeHall, 1987)

SM Kay, SL Marple, Spectrum analysis—a modern perspective. Proc. IEEE (Modern Spectrum Analysis II) 69(11)

SM Kay, Modern Spectral Estimation: Theory and Application (Englewood Cliffs: PrenticeHall, 1988)

P Stoica, RL Moses, Introduction to Spectral Analysis (PrenticeHall: Upper Saddle River, 1997)

P Stoica, A Nehorai, maximumlikelihoodandCramerRaobound Music, IEEE Trans. ASSP 37(5), 720–741 (1989)

P Stoica, A Nehorai, Performance study of conditional and unconditional directionofarrival estimation. IEEE Trans. ASSP 38(10), 1783–1795 (1990)

S Holm, A Austeng, K Iranpour, JF Hopperstad, Sparse sampling in array processing. in Nonuniform Sampling: Theory and Practice, ed. by Marvasti F (New York: Springer, 2001), pp. 787–833

S Aeron, M Zhao, V Saligrama, Fundamental tradeoffs between sparsity,sensing diversity,sensing capacity. in Asilomar Conference on Signals, Systems and Computers, ACSSC’06 (Pacific Grove,CA: Oct–Nov, 2006), pp. 295–299

P Bofill, M Zibulevsky, Underdetermined blind source separation using sparse representations. Signal Process Elsevier 81(11), 2353–2362 (2001)

MA Girolami, JG Taylor, SelfOrganising Neural Networks:Independent Component Analysis and Blind Source Separation (London: Springer, 1999)

P Georgiev, F Theis, A Cichocki, Sparse component analysis and blind source separation of underdetermined mixtures. IEEE Trans. Neural Netw 16(4), 992–996 (2005)

M Aharon, M Elad, AM Bruckstein, The ksvd: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process 54(11), 4311–4322 (2006)

M Aharon, M Elad, AM Bruckstein, On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them. Linear Algebra Appl 416(1), 48–67 (2006)

R Gribonval, M Nielsen, Sparse representations in unions of bases. IEEE Trans. Inf. Theory 49(12), 3320–3325 (2003)

P Fertl, G Matz, Efficient OFDM channel estimation in mobile environments based on irregular sampling. in Proc, ed. by . Asil. Conf. Sig. Sys. and Computers (US: Pacific Grove, May 2007), pp. 1777–1781

O Ureten, N Serinken, Decision directed iterative equalization of OFDM symbols using nonuniform interpolation. in IEEE conf, ed. by . on Vehicular Technol. Conf. (VTC) (Canada: Ottawa, Sep. 2007), pp. 1–5

M Soltanolkotabi, A Amini, F Marvasti, OFDM channel estimation based on adaptive thresholding for sparse signal detection. in Proc, ed. by . EUSIPCO’09 (Scotland: Glasgow, Aug 2009), pp. 1685–1689

JL Brown, Sampling extentions for multiband signals. IEEE Trans. Acoust. Speech Signal Process 33, 312–315 (1985)

OG Guleryuz, Nonlinear approximation based image recovery using adaptive sparse reconstructions and iterated denoising, parts I and II. IEEE Trans. Image Process 15(3), 539–571 (2006)

H Rauhut, On the impossibility of uniform sparse reconstruction using greedy methods. Sampl Theory Signal Image Process 7(2), 197–215 (2008)

T Blu, P Dragotti, M Vetterli, P Marziliano, P Coulot, Sparse sampling of signal innovations: Theory, algorithms, and performance bounds. IEEE Signal Process. Mag 25(2) (2008)

F Marvasti, Guest editor’s comments on special issue on nonuniform sampling. Sampl Theory Signal Image Process 7(2), 109–112 (2008)

D Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

E Candès, Compressive sampling. in Int, ed. by . Congress of Mathematics (Spain: Madrid, 200), pp. 1433–1452

S Zahedpour, S Feizi, A Amini, M Ferdosizadeh, F Marvasti, Impulsive noise cancellation based on soft decision and recursion. IEEE Trans. Instrum. Meas 58(8), 2780–2790 (2009)

S Feizi, S Zahedpour, M Soltanolkotabi, A Amini, F Marvasti, Salt and pepper noise removal for images (Russia: St. Petersburg, June 2008)

PD Grunwald, IJ Myung, MA Pitt, Advances in Minimum Description Length: Theory and Applications (Cambridge: MIT Press, 2005)

A Kumar, P Ishwar, K Ramchandran, On distributed sampling of bandlimited and nonbandlimited sensor fields. Acoustics, Speech, and Signal Processing (USA: Berkeley, CA, Aug 2004), pp. 925–928

P Fertl, G Matz, Multiuser channel estimation in OFDMA uplink systems based on irregular sampling and reduced pilot overhead. in Proc, ed. by . Acoustics, Speech and Sig. Proc., (ICASSP) (US: Honolulu, April 2007), pp. 297–300

F Marvasti, Spectral analysis of random sampling and error free recovery by an iterative method. Trans. IECE Jpn 69(2), 79–82 (1986)

RA DeVore, Deterministic constructions of compressed sensing matrices. J. Complex 23(4–6), 918–925 (2007)

SG Mallat, Z Zhang, Matching pursuits with timefrequency dictionaries. IEEE Trans. Signal Process 41(12), 3397–3415 ((1993))

R Gribonval, P Vandergheynst, On the exponential convergence of matching pursuit in quasiincoherent dictionaries. IEEE Trans Inf. Theory 52(1), 255–261 (2006)

JA Tropp, Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)

D Needell, JA Tropp, Cosamp: iterative signal recovery from incomplete and inaccurate samples. App. Comp. Harmon. Anal 26, 301–321 (2009)

DL Donoho, M Elad, V Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inf. Theory 52(1), 6–18 ((2006))

MS Bazaraa, JJ Jarvis, HD Sherali, Linear programming and Network Flows (New York: Wiley, 1990)

M Figueiredo, R Nowak, S Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process 1(4), 586–597 (2007)

I Daubechies, MD Friese, CD Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl Math 57, 1413–1457 (2004)

IF Gorodnitsky, BD Rao, Sparse signal reconstruction from limited data using FOCUSS, a reweighted minimum norm algorithm. IEEE Trans. Signal Process 45(3), 600–616 (1997)

GH Mohimani, M BabaieZadeh, C Jutten, A fast approach for overcomplete sparse decomposition based on smoothed ℓ0norm. IEEE Trans Signal Process 57(1), 289–301 (2009)

F Marvasti, AK Jain, Zero crossings,bandwidth compression and restoration of nonlinearly distorted bandlimited signals. J. Opt Soc. Am 3(5), 651–654 (1986)

A Aldroubi, C Cabrelli, U Molter, Optimal nonlinear models for sparsity and sampling. J Fourier Anal. Appl. (Special Issue on Compressed Sampling) 14(6), 48–67 (2008)

A Amini, F Marvasti, Convergence analysis of an iterative method for the reconstruction of multiband signals from their uniform and periodic nonuniform samples. STSIP 7(2), 109–112 (2008)

PJSG Ferreira, Iterative and noniterative recovery of missing samples for 1D bandlimited signals. in Nonuniform Sampling: Theory and Practice, ed. by Marvasti F (Boston: Springer, 2001), pp. 235–281

PJSG Ferreira, The stability of a procedure for the recovery of lost samples in bandlimited signals. IEEE Trans. Signal Process 40(3), 195–205 (1994)

F Marvasti, A Unified Approach to Zerocrossings and Nonuniform Sampling of Single and Multidimensional Signals and Systems (Oak Park: Nonuniform Publication, 1987)

Marks RJ (ed.), Nonuniform Sampling (New York: Springer, 1993)

AI Zayed, PL Butzer, Lagrange interpolation and sampling theorems. in Nonuniform Sampling: Theory and Practice, ed. by Marvasti F (New York: Springer, formerly Kluwer Academic/Plenum Publishers, 2001), pp. 123–168

F Marvasti, P Clarkson, M Dokic, U Goenchanart, C Liu, Reconstruction of speech signals with lost samples. IEEE Trans. Signal Process 40(12), 2897–2903 (1992)

M Unser, Sampling50 years after Shannon. Proc. IEEE 88(4), 569–587 (April 2000)

F Marvasti, Random topics in nonuniform sampling. in Nonuniform Sampling: Theory and Practice, ed. by Marvasti F (Springer, formerly Kluwer Academic/Plenum Publishers, 2001), pp. 169–234

F Marvasti, M Hasan, M Eckhart, S Talebi, Efficient algorithms for burst error recovery using FFT and other transform kernels. IEEE Trans. Signal Process 47(4), 1065–1075 (1999)

PJSG Ferreira, Mathematics for multimedia signal processing II: Discrete finite frames and signal reconstruction. Signal Process Multimed IOS press 174, 35–54 (1999)

F Marvasti, M Analoui, M Gamshadzahi, Recovery of signals from nonuniform samples using iterative methods. IEEE Trans. ASSP 39(4), 872–878 (1991)

H Feichtinger, K Grô̈chenig, Theory and practice of irregular sampling. in Wavelets Mathematics and Applications, ed. by Benedetto JJ, Frazier M (Boca Raton: CRC Publications, 1994), pp. 305–363

PJSG Ferreira, Noniterative and fast iterative methods for interpolation and extrapolation. IEEE Trans. Signal Process 42(11), 3278–3282 (1994)

A Aldroubi, K Grôchenig, Nonuniform sampling and reconstruction in shiftinvariant spaces. SIAM Rev 43(4), 585–620 (2001)

A Aldroubi, Nonuniform weighted average sampling exact reconstruction in shiftinvariant and wavelet spaces. Appl. Comput. Harmon. Anal 13(2), 151–161 (2002)

A Papoulis, C Chamzas, Detection of hidden periodicities by adaptive extrapolation. IEEE Trans. Acoust. Speech Signal Process 27(5), 492–500 (1979)

WYXu C Chamzas, An improved version of PapoulisGerchberg algorithm on bandlimited extrapolation. IEEE Trans. Acoust. Speech Signal Process 32(2), 437–440 (1984)

PJSG Ferreira, Interpolation and the discrete PapoulisGerchberg algorithm. IEEE Trans. Signal Process 42(10), 2596–2606 (1994)

K Gröchenig, T Strohmer, Numerical and theoretical aspects of no. in Nonuniform Sampling: Theory and Practice, ed. by Marvasti F (New York: Springer, formerly Kluwer Academic/Plenum Publishers, 2001), pp. 283–324

DC Youla, Generalized image restoration by the method of alternating orthogonal projections. IEEE Trans. Circuits Syst 25(9), 694–702 (1978)

DC Youla, H Webb, Image restoration by the method of convex projections: Part 1theory. IEEE Trans. Med. Imag 1(2), 81–94 (1982)

K Gröchenig, Acceleration of the frame algorithm. IEEE Trans. Signal Process 41(12), 3331–3340 (1993)

A AliAmini, M BabaieZadeh, C Jutten, A New Approach for Sparse Decomposition and Sparse Source Separation (EUSIPCO2006,Florence)

F Marvasti, Applications to error correction codes. in Nonuniform Sampling: Theory and Practice, ed. by Marvasti F (New York: Springer, 2001), pp. 689–738

E Candes, J Romberg, T Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

E Candes, T Tao, Nearoptimal signal recovery from random projections: universal encoding strategies. IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)

Y Tsaig, D Donoho, Extensions of compressed sensing. Signal Process 86(3), 549–571 (2006)

AJ Jerri, The Shannon sampling theoremits various extension and applications: a tutorial review. Proc IEEE 65(11), 1565–1596 (1977)

E Candes, M Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag 25(2), 21–30 (2008)

Y Eldar, Compressed sensing of analog signals in shiftinvariant spaces. IEEE Trans. Signal Process 57(8), 2986–2997 (2009)

E Candes, J Romberg, Sparsity and incoherence in compressive sampling. Inverse Probl 23, 969–985 (2007)

D Donoho, X Hou, Uncertainty principle and ideal atomic decomposition. IEEE Trans.Inf. Theory 47(7), 2845–2862 (2001)

RG Baraniuk, M Davenport, R DeVore, M Wakin, A simple proof of the restricted isometry property for random matrices. Constr Approx 28(3), 253–263 (2008)

AC Gilbert, MJ Strauss, JA Tropp, R Vershynin, Algorithmic linear dimension reduction in the ℓ1 norm for sparse vectors. in Proc, ed. by . Allerton conf. Comm., Control and Comp (USA: IL, 2006)

DL Donoho, For most large underdetermined systems of linear equations the minimal ℓ1norm solution is also the sparsest solution 59, 797–829

JA Tropp, Recovery of short linear combinations via ℓ1 minimization. IEEE Trans. Inf. Theory 90(4), 1568–1570 (2005)

E Candes, T Tao, Decoding by linear programming. IEEE Trans. Inf. Theory 51, 4203–4215 (2005)

J Tropp, A Gilbert, Signal recovery from partial information via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)

D Donoho, J Tanner, Precise undersampling theorems. Proc IEEE 98(6), 913–924 (2010)

E Candes, J Romberg, Quantitative robust uncertainty principles and optimally sparse decompositions. Found. Comput. Math 6(2), 227–254 (2006)

E Candes, J Romberg, T Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl. Math 59(8), 1207–1223 (2006)

V Saligrama, Deterministic designs with deterministic gaurantees:Toepliz compressed sensing matrices, sequence design and system identification. arXiv 0806, 4958 (2008)

Ev Berg, MP Friedlander, G Hennenfent, F Herrmann, R Saab, Ö Yılmaz, Sparco: a testing framework for sparse reconstruction, Dept. Comp. Sci. Univ. Br. Columbia, Vancouver. Tech. Rep TR200720 (October 2007)

R Berinde, AC Gilbert, P Indyk, H Karloff, MJ Strauss, Combining geometry and combinatorics:a unified approach to sparse signal recovery. in Proc, ed. by . Allerton conf. Comm., Control and Comp (US: IL, 2008) (pp, 2008), . 798–805

MF Duarte, MB Wakin, RG Baraniuk, Fast reconstruction of piecewise smooth signals from random projections. in Proc, ed. by . SPARS05 (France: Rennes, Nov 2005), pp. 1064–1070

AC Gilbert, Y Kotidis, S Muthukrishnan, MJ Strauss, Onepass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng 15(3), 541–554 (2003)

AC Gilbert, MJ Strauss, JA Tropp, R Vershynin, One sketch for all: fast algorithms for compressed sensing. ACM STOC (San Diego: CA, June 2007), pp. 237–246

S Sarvotham, D Baron, RG Baraniuk, Compressed sensing reconstruction via belief propagation. Technical Report ECE0601 (ECE Dept., Rice University, July 2006) (http://dsp, July 2006), . rice.edu/sites/dsp.rice.edu/files/cs/csbpTR07.142006.pdf webcite

S Sarvotham, D Baron, RG Baraniuk, Sudocodesfast measurement and reconstruction of sparse signals. IEEE ISIT (USA: Seattle, 2006), pp. 2804–2808

W Xu, B Hassibi, Efficient compressive sensing with deterministic guarantees using expander graphs. in IEEE Inf, ed. by . Theory Workshop, ITW’07 (Japan: Tokyo, Sep. 2007), pp. 414–419

A Aldroubi, H Wang, K Zaringhalam, Sequential compressed sampling via Huffman codes

PL Dragotti, M Vetterli, T Blu, Sampling moments and reconstructing signals of finite rate of innovation:Shannon meets strangfix. IEEE Trans. Signal Process 55(5), 1741–1757 (2007)

I Maravic, M Vetterli, Sampling and reconstruction of signals with finite rate of innovation in the presence of noise. IEEE Trans. Signal Process 53(8), 2788–2805 (2005)

P Azmi, F Marvasti, Robust decoding of DFTbased errorcontrol codes for impulsive and additive white gaussian noise channels. IEEE Proc. Commun 152(3), 265–271 (2005)

F Marvasti, M Nafie, Sampling theorem: a unified outlook on information theory, block and convolutional codes. Spec. Issue Info. Theory Appl., IEICE Trans. Fundam. Electron. Commun. Comput. Sci. Sec. E 76(9), 1383–1391 (1993)

J Wolf, Redundancy, the discrete Fourier transform, and impulse noise cancellation. IEEE Trans. Commun 31(3), 458–461 (1983)

RE Blahut, Transform techniques for error control codes. IBM J. Res. Dev 23(3), 299–315 (1979)

T Jr Marshall, Coding of realnumber sequences for error correction: a digital signal processing problem. IEEE J. Sel. Areas Commun 2(2), 381–392 (2002)

C Berrou, A Glavieux, P Thitimajshima, Near Shannon limit error correcting coding and decoding:Turbo codes. in Proc, ed. by . Int. Conf. Comm. (ICC) (Switzerland: Geneva, 1993), pp. 1064–1070

CN Hadjicostis, GC Verghese, Coding approaches to fault tolerance in linear dynamic systems. IEEE Trans. Inf. Theory 51(1), 210–228 (2005)

CJ Anfinson, FT Luk, A linear algebraic model of algorithmbased fault tolerance. IEEE Trans. Comput 37(12), 1599–1604 (1988)

VSS Nair, JA Abraham, Realnumber codes for faulttolerant matrix operations on processor arrays. IEEE Trans. Comput 39(4), 426–435 (1990)

ALN Reddy, P Banerjee, Algorithmbased fault detection for signal processing applications. IEEE Trans. Comput 39(10), 1304–1308 (1990)

JMN Vieira, PJSG Ferreira, Interpolation, spectrum analysis, errorcontrol coding, and fault tolerant computing. in Proc, ed. by . of ICASSP’97 (Germany: Munich, Apr 1997), pp. 1831–1834

M Nafie, F Marvasti, Implementation of recovery of speech with missing samples on a DSP chip. Electron Lett 30(1), 12–13 (1994)

A Momenai, S Talebi, Improving the stability of DFT error recovery codes by using sparse oversampling patterns. Elsevier 87(6), 1448–1461 (2007)

F Marvasti, Error concealment of speech, image and video signals. U.S Patents 6,601, 206 (July 2003)

C Wong, F Marvasti, W Chambers, Implementation of recovery of speech with impulsive noise on a DSP chip. Electron Lett 31(17), 1412–1413 (1995)

D Mandelbaum, On decoding of ReedSolomon codes. IEEE Trans. Inf. Theory 17(6), 707–712 (1971)

de B G R Prony, Essai éxperimental et analytique: sur les lois de la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool, á différentes températures. J. l’École. Polytech 1, 24–76 (1795)

JJ Fuchs, Extension of the Pisarenko method to sparse linear arrays. IEEE Trans. Signal Process 45(10), 2413–2421 (1997)

R Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag 34(3), 276–280 (1986)

JJ Fuchs, On the application of the global matched filter to DOA estimation with uniform circular array. IEEE Trans. Signal Process 49(4), 702–709 (2001)

JJ Fuchs, Linear programming in spectral estimation. Application to array processing. in Proc, ed. by . IEEE Int. Conf. Acoustics Speech Signal Proc., ICASSP’96, vol. 6, (USA, 7–10: Atlanta, GA, May 1996), pp. 3161–3164

H Krim, M Viberg, Two decades of array signal processing research: the parametric approach. IEEE Signal Process. Mag 13(4), 67–94 (1996)

BDV Veen, KM Buckley, Beamforming: a versatile approach to spatial filtering. IEEE ASSP Mag 5(2), 4–24 (1988)

S Valaee, P Kabal, An information theoretic approach to source enumeration in array signal processing. IEEE Trans. Signal Process 52(5), 1171–1178 (2004)

R Roy, T Kailath, Espritestimation of signal parameters via rotational invariance techniques. IEEE Trans. ASSP 37(7), 984–995 (1989)

M Wax, T Kailath, Detection of signals by information theoretic criteria. IEEE Trans. ASSP 33(2), 387–392 (1985)

I Ziskind, M Wax, Maximum likelihood localization of multiple sources by alternating projection. IEEE Trans. ASSP 36(10), 1553–1560 (1988)

M Viberg, B Ottersten, Sensor array processing based on subspace fitting. IEEE Trans. Signal Process 39(5), 1110–1121 (1991)

S Shahbazpanahi, S Valaee, AB Gershman, A covariance fitting approach to parametric localization of multiple incoherently distributed sources. IEEE Trans. Signal Process 52(3), 592–600 (2004)

J Rissanen, Modeling by shortest data description. Automatica 14, 465–471 (1978)

H Akaike, A new look on the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)

M Kaveh, H Wang, H Hung, On the theoretical performance of a class of estimators of the number of narrowband sources. IEEE Trans. ASSP 35(9), 1350–1352 (1987)

QT Zhang, KM Wong, PC Yip, JP Reilly, Statistical analysis of the performance of information theoretic criteria in the detection of the number of signals in array processing. IEEE Trans. ASSP 37(10), 1557–1567 (1989)

HV Poor, An Introduction to Signal Detection Estimation (New York: Springer, 1994)

TM Cover, JA Thomas, Elements of Information Theory (Hoboken: Wiley, 2006)

J Rissanen, A universal prior for integers and estimation by minimum description length. Annals Stat 11(2), 416–431 (1983)

B Nadler, A Kontorvich, Model selection for sinusoids in noise: statistical analysis and a new penalty term. IEEE Trans. Signal Process 59(4), 1333–1345 (2011)

TW Anderson, T Wilbur, An Introduction to Multivariate Statistical Analysis (New York: Wiley, 1958)

F Haddadi, MRM Mohammadi, MM Nayebi, MR Aref, Statistical performance analysis of detection of signals by information theoretic criteria. IEEE Trans. Signal Process 58(1), 452–457 (2010)

M Gastpar, M Vetterli, Sourcechannel communication in sensor networks. Lecture Notes in Computer Science (New York: Springer, 2003), pp. 162–177

AM Sayeed, A statistical signal modeling framework for wireless sensor networks, in Proc. 2nd Int. Workshop on Info. Proc. in Sensor Networks, IPSN’03, UW Tech, ed. by . Rep. ECE104 (WI: Univ. Wisconsin, Madison, Feb 2004), pp. 162–177

K Liu, AM Sayeed, Optimal distributed detection strategies for wireless sensor networks. in Proc, ed. by . 42nd Annual Allerton Conf. on Comm., Control and Comp. (IL: Monticello, Oct 2004), pp. 703–707

A D’Costa, V Ramachandran, A Sayeed, Distributed classification of gaussian spacetime sources in wireless sensor networks. IEEE J. Sel. Areas Commun 22(6), 1026–1036 (2004)

WU Bajwa, J Haupt, AM Sayeed, R Nowak, Compressive wireless sensing. in Proc, ed. by . Int. Symposium on Info. Proc. in Sensor Networks, IPSN’06 (TN: Nashville, Apr 2006), pp. 134–142

R Rangarajan, R Raich, A Hero, Sequential design of experiments for a rayleigh inverse scattering problem (France: Bordeaux, July 2005)

Y Yang, RS Blum, Radar waveform design using minimum meansquare error and mutual information. in Fourth IEEE Workshop Sens, ed. by . Array Multichannel Proc. vol. 12 (USA: Waltham, MA, 2006), pp. 234–238

A Kumar, P Ishwar, K Ramchandran, On distributed sampling of smooth nonbandlimited fields. in Int, ed. by . Simp. On Info. Proc. In Sensor Netwroks ISPN2004 (CA: Berkeley, April 2004), pp. 89–98

E Meijering, A chronology of interpolation: from ancient astronomy to modern signal and image processing. Proc. IEE 90(3), 319–342 (March 2002)

D Ganesan, S Ratnasamy, H Wang, D Estrin, Coping with irregular spatiotemporal sampling in sensor networks. ACM SIGCOMM Comput. Commun Rev 34(1), 125–130 (2004)

R Wagner, R Baraniuk, S Du, D Johnson, A Cohen, An architecture for distributed wavelet analysis and processing in sensor networks. in Proc, ed. by . of Int. Workshop Info. Proc. in Sensor Networks, IPSN’06 (TN: Nashville, Apr 2006), pp. 243–250

R Wagner, H Choi, R Baraniuk, V Delouille, Distributed wavelet transform for irregular sensor network grids, in Proc, ed. by . IEEE Stat. Signal Proc. Workshop (SSP) (Bordeaux, July 2005), pp. 1196–1201

SS Pradhan, J Kusuma, K Ramchandran, Distributed compression in a dense microsensor network. IEEE Signal Process. Mag 19(2), 51–60 (2002)

M Gastpar, M Vetterli, Power, spatiotemporal bandwidth distortion in large sensor networks. IEEE J Sel. Areas Commun 23(4), 745–754 (2005)

WU Bajwa, AM Sayeed, R Nowak, Matched sourcechannel communication for field estimation in wireless sensor networks (Los Angeles: CA, April 2005)

R Mudumbai, J Hespanha, U Madhow, G Barriac, Scalable feedback control for distributed beamforming in sensor networks. in Proc, ed. by . of the Int. Symposium on Info. Theory, ISIT’05 (SA: Adelaide, Sept 2005), pp. 137–141

J Haupt, R Nowak, Signal reconstruction from noisy random projections. IEEE Trans. Inf. Theory 52(9), 4036–4068 (2006)

D Baron, MB Wakin, MF Duarte, S Sarvotham, RG Baraniuk, Distributed compressed sensing, submitted for publication, preprint (http://www, November 2005), . ece.rice.edu/drorb/pdf/DCS112005.pdf webcite

W Wang, M Garofalakis, K Ramchandran, Distributed sparse random projections for refinable approximation. in Proc, ed. by . IPSN’07 (MA: Cambridge, April 2007), pp. 331–339

MF Duarte, MB Wakin, D Baron, RG Baraniuk, Universal distributed sensing via random projections (TN: Nashville, April 2006)

J Haupt, WU Bajwa, M Rabbat, R Nowak, Compressed sensing for networked data. IEEE Signal Process. Mag 2, 92–101 (2008)

M Crovella, E Kolaczyk, Graph wavelets for spatial traffic analysis. in INFOCOM 2003, ed. by . TwentySecond Annual Joint Conf. of the IEEE Computer and Communications Societies. IEEE (MA: Boston University, March 2003), pp. 1848–1857

R Coifman, M Maggioni, Diffusion wavelets. Appl. Comput. Harmon Anal 21(1), 53–94 (2006)

WU Bajwa, J Haupt, AM Sayeed, R Nowak, Joint sourcechannel communication for distributed estimation in sensor networks. IEEE Trans. Inf. Theory 53(10), 3629–3653 (2007)

S Boyd, A Ghosh, B Prabhakar, D Shah, Randomized gossip algorithms. IEEE Trans. Inf. Theory IEEE/ACM Trans. Netw 52(6), 2508–2530 (2006)

M Rabbat, J Haupt, A Singh, R Nowak, Decentralized compression and predistribution via randomized gossiping. in Proc, ed. by . IPSN’06 (TN: Nashville, April 2006), pp. 51–59

SJ Kim, K Koh, M Lustig, S Boyd, D Gorinevsky, A method for largescale ℓ1regularized least squares problems with applications in signal processing and statistics. IEEE J. Sel. Top. Signal Process 1(4), 606–617 (2007)

Y Rachlin, R Negi, P Khosla, Sensing capacity for discrete sensor network applications. in Proc, ed. by . of the Fourth Int. Symposium on Info. Proc. in Sensor Networks (PA: Pittsburgh, April 2005), pp. 126–132

S Aeron, M Zhao, V Saligrama, Information theoretic bounds to sensing capacity of sensor networks under fixed SNR. in IEEE Info, ed. by . Theory Workshop, ITW’07 (CA: Lake Tahoe, 2007), pp. 84–89

S Aeron, M Zhao, V Saligrama, Information theoretic bounds for compressed sensing. IEEE Trans. Inf. Theory 52(10), 5111–5130 (2010)

S Sanei, J Chambers, EEG Signal Processing (Hoboken: Wiley, 2007)

J Herault, C Jutten, Space or time adaptive signal processing by neural network models. in Proc, ed. by . of American Institute of Physics (AIP) Conf.: Neural Networks for Computing (US: Snowbird,Utah, 1986), pp. 206–211

AM Aibinu, AR Najeeb, MJE Salami, AA Shafie, Optimal model order selection for transient error autoregressive moving average (TERA) MRI reconstruction method. Proc. World Acad. Sci. Eng. Technol 32, 191–195 (2008)

MS Pedersen, J Larsen, U Kjems, LC Parra, A Survey of Convolutive Blind Source Separation Methods, Springer Handbook of Speech Processing (Berlin: Springer, 2007)

P Comon, Independent component analysis: a new concept. Signal Process 36, 287–314 (1994)

M Zibulevsky, BA Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary. Neural Comp 13(4), 863–882 (2001)

Y Luo, JA Chambers, S Lambotharan, I Proudler, Exploitation of source nonstationarity in underdetermined blind source separation with advanced clustering techniques. IEEE Trans. Signal Process 54(6), 2198–2212 (2006)

Y Li, S Amari, A Cichocki, DWC Ho, S Xie, Underdetermined blind source separation based on sparse representation. IEEE Trans. Signal Process 54(2), 423–437 (2006)

A Jourjine, S Rickard, O Yilmaz, Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures. in Proc, ed. by . of IEEE Conf. on Acoustic, Speech, and Signal Proc. ICASSP’2000 (Turkey: Istanbul, 2000), pp. 2985–2988

L Vielva, D Erdogmus, C Pantaleon, I Santamaria, J Pereda, JC Principe, Underdetermined blind source separation in a timevarying environment. in Proc, ed. by . of IEEE Conf. On Acoustic, Speech, and Signal Proc. ICASSP’02 (USA: Orlando, FL, 13), pp. 3049–3052

I Takigawa, M Kudo, A Nakamura, J Toyama, On the minimum ℓ1norm signal recovery in underdetermined source separation. in Proc, ed. by . of 5th Int. Conf. on Independent Component Analysis (Spain: Granada, 2004), pp. 22–24

CC Took, S Sanei, J Chambers, A filtering approach to underdetermined BSS with application to temporomandibular disorders. in Proc IEEE ICASSP’06 (France: IEEE, May 2006), pp. 1124–1127

T Melia, S Rickard, Underdetermined blind source separation in echoic environment using DESPRIT, EURASIP. J. Adv. Signal Process 2007(Article No 86484), 19 (2007)

K Nazarpour, S Sanei, L Shoker, JA Chambers, Parallel spacetimefrequency decomposition of EEG signals for brain computer interfacing. in Proc, ed. by . of EUSIPCO 2006 (Italy: Florence), pp. 2006–2006

R Gribonval, S Lesage, A survey of sparse component analysis for blind source separation: principles, perspectives, and new challenges. Proc. of ESANN, 323–330 (April 2006)

SS Chen, DL Donoho, MA Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput 20, 33–61 (1998)

E Candes, J Romberg, ℓ1magic: Recovery of sparse signals (http://www), . acm.caltech.edu/l1magic/.Availableonline webcite

JJ Fuchs, On sparse representations in arbitrary redundant bases. IEEE Trans. Inf. Theory 50(6), 1341–1344 (2004)

J Bobin, Y Moudden, JL Starck, M Elad, Morphological diversity and source separation. IEEE Signal Process. Lett 13(7), 409–502 (2006)

J Bobin, JL Starck, JM Fadili, Y Moudden, DL Donoho, Morphological component analysis: an adaptive thresholding strategy. IEEE Trans. Image Process 16(11), 2675–2685 (2007)

A Schmidt, JMF Moura, Field inversion by consensus and compressed sensing. in Proc, ed. by . of IEEE Int. Conf. on Acoustics, Speech and Signal Proc., ICASSP’09 (Taiwan: IEEE,Taipei, 2009), pp. 2417–2420

M Mishali, YC Eldar, Sparce source separation from orthogonal mixtures. in Proc, ed. by . of IEEE Int. Conf. on Acoustics, Speech and Signal Proc., ICASSP’09 IEEE (Taiwan: Taipei, 2009), pp. 3145–3148

H Schulze, C Lueders, Theory and Applications of OFDM and CDMA:Wideband Wireless Communications (NJ: Wiley, 2005)

M Ozdemir, H Arslan, Channel estimation for wireless ofdm systems. IEEE Commun. Surv. Tutor 9(2), 18–48 (2007)

JJV de Beek, O Edfors, M Sandell, SK Wilson, P Borjesson, On channel estimation in OFDM systems. in Proc, ed. by . 45th IEEE Vehicular Technology Conf., Chicago (IL: Chicago, July 1995), pp. 815–819

M Morelli, U Mengali, A comparison of pilotaided channel estimation methods for ofdm systems. IEEE Trans. Signal Process 49(12), 3065–3073 (2001)

O Edfors, M Sandell, JJV de Beek, SK Wilson, OFDM channel estimation by singular value decomposition. IEEE Trans. Commun 46(7), 931–939 (1998)

P Strobach, Lowrank adaptive filters. IEEE Trans. Signal Process 44(2), 2932–2947 (1996)

S Coleri, M Ergen, A Puri, A Bahai, Channel estimation techniques based on pilot arrangement in OFDM systems. IEEE Trans. Broadcast 48(3), 223–229 (2002)

SG Kang, YM Ha, EK Joo, A comparative investigation on channel estimation algorithms for OFDM in mobile communications. IEEE Trans. Broadcast 49(2), 142–149 (2003)

G Tauböck, F Hlawatsch, A compressed sensing technique for OFDM channel estimation in mobile environments. in exploiting channel sparsity for reducing pilots, in Proc, ed. by . ICASSP’08 (March: Las Vegas, 2008), pp. 2885–2888

MR Raghavendra, K Giridhar, Improving channel estimation in OFDM systems for sparse multipath channels. IEEE Signal Process. Lett 12(1), 52–55 (2005)

T Kang, RA Iltis, Matching pursuits channel estimation for an underwater acoustic OFDM modem. in in IEEE Int, ed. by . Conf. on Acoustics, Speech and Signal Proc., ICASSP’08 (March: Las Vegas, 2008), pp. 5296–5299

P Pakrooh, A Amini, F Marvasti, Ofdm pilot allocation for sparse channel estimation

O Christensen, in An introduction to frames and Riesz basis, ser, ed. by . Applied and Numerical Harmonic Analysis (Birkhauser: Basel, 2003)