Abstract
Recently, sparse representation based methods have proven to be successful towards solving image restoration problems. The objective of these methods is to use sparsity prior of the underlying signal in terms of some dictionary and achieve optimal performance in terms of meansquared error, a metric that has been widely criticized in the literature due to its poor performance as a visual quality predictor. In this work, we make one of the first attempts to employ structural similarity (SSIM) index, a more accurate perceptual image measure, by incorporating it into the framework of sparse signal representation and approximation. Specifically, the proposed optimization problem solves for coefficients with minimum norm and maximum SSIM index value. Furthermore, a gradient descent algorithm is developed to achieve SSIMoptimal compromise in combining the input and sparse dictionary reconstructed images. We demonstrate the performance of the proposed method by using image denoising and superresolution methods as examples. Our experimental results show that the proposed SSIMbased sparse representation algorithm achieves better SSIM performance and better visual quality than the corresponding least squarebased method.
1 Introduction
In many signal processing problems, mean squared error (MSE) has been the preferred choice as the optimization criterion due to its ease of use and popularity, irrespective of the nature of signals involved in the problem. The story is not different for image restoration tasks. Algorithms are developed and optimized to generate the output image that has minimum MSE with respect to the target image [16]. However, MSE is not the best choice when it comes to image quality assessment (IQA) and signal approximation tasks [7]. In order to achieve better visual performance, it is desired to modify the optimization criterion to the one that can predict visual quality more accurately. SSIM has been quite successful in achieving superior IQA performance [8]. Figure 1 demonstrates the difference between the performance of SSIM and absolute error (the bases for , MSE, PSNR, etc.). Figure 1c shows the quality map of the image 1b with reference to 1a, obtained by calculating the absolute pixelbypixel error, which forms the basis of MSE calculation for quality evaluation. Figure 1d shows the corresponding SSIM quality map which is used to calculate the SSIM index of the whole image. It is quite evident from the maps that SSIM performs a better job in predicting perceived image quality. Specifically, the absolute error map is uniform over space, but the texture regions in the noisy image appear to be much less noisier than the smooth regions. Clearly, the SSIM map is more consistent with such observations.
Figure 1. Comparison of SSIM and MSE for "Barbara" image altered with additive white Gaussian noise. (a) Original image; (b) noisy image; (c) absolute error map (brighter indicates better quality/smaller absolute difference); (d) SSIM index map (brighter indicates better quality/larger SSIM value).
The SSIM index and its extensions have found a wide variety of applications, ranging from image/video coding i.e., H.264 video coding standard implementation [9], image classification [10], restoration and fusion [11], to watermarking, denoising and biometrics (see [7] for a complete list of references). In most existing works, however, SSIM has been used for quality evaluation and algorithm comparison purposes only. SSIM possesses a number of desirable mathematical properties, making it easier to be employed in optimization tasks than other stateoftheart perceptual IQA measures [12]. But, much less has been done on using SSIM as an optimization criterion in the design and optimization of image processing algorithms and systems [1319].
Image restoration problems are of particular interest to image processing researchers, not only for their practical value, but also because they provide an excellent test bed for image modeling, representation and estimation theories. When addressing general image restoration problems with the help of Bayesian approach, an image prior model is required. Traditionally, the problem of determining suitable image priors has been based on a close observation of natural images. This leads to simplifying assumptions such as spatial smoothness, low/maxentropy or sparsity in some basis set. Recently, a new approach has been developed for learning the prior based on sparse representations. A dictionary is learned either from the corrupted image or a highquality set of images with the assumption that it can sparsely represent any natural image. Thus, this learned dictionary encapsulates the prior information about the set of natural images. Such methods have proven to be quite successful in performing image restoration tasks such as image denoising [3] and image superresolution [5,20]. More specifically, an image is divided into overlapping blocks with the help of a sliding window and subsequently each block is sparsely coded with the help of dictionary. The dictionary, ideally, models the prior of natural images and is therefore free from all kinds of distortions. As a result the reconstructed blocks, obtained by linear combination of the atoms of dictionary, are distortion free. Finally, the blocks are put back into their places and combined together in light of a global constraint for which a minimum MSE solution is reached. The accumulation of many blocks at each pixel location might affect the sharpness of the image. Therefore, the distorted image must be considered as well in order to reach the best compromise between sharpness and admissible distortions.
Since MSE is employed as the optimization criterion, the resulting output image might not have the best perceptual quality. This motivated us to replace the role of MSE with SSIM in the framework. The solution of this novel optimization problem is not trivial because SSIM is nonconvex in nature. There are two key problems that have to be resolved before effective SSIMbased optimization can be performed. First, how to optimally decompose an image as a linear combination of basis functions in maximal SSIM, as opposed to minimal MSE sense. Second, how to estimate the best compromise between the distorted and sparse dictionary reconstructed images for maximal SSIM. In this article, we provide solutions to these problems and use image denoising and image superresolution as applications to demonstrate the proposed framework for image restoration problems.
We formulate the problem in Section 2.1 and provide our solutions to issues discussed above in Sections 2.2 and 2.3. Section 3.1 describes our approach to denoise the images. The proposed method for image superresolution is described in Section 3.2 and finally we conclude in Section 4.
2 The proposed method
In this section we will incorporate SSIM as our quality measure, particularly for sparse representation. In contrast to what we may expect, it is shown that sparse representation in minimal norm sense can be easily converted to maximal SSIM sense. We will also use a gradient descend approach to solve a global optimization problem in maximal SSIM sense. Our framework can be applied to a wide class of problems dealing with sparse representation to improve visual quality.
2.1 Image restoration from sparsity
The classic formulation of image restoration problem is as following:
where x ∈ ℝ^{n}, y ∈ ℝ^{m}, n ∈ ℝ^{m}, and Φ ∈ ℝ^{m x n}. Here we assume x and y are vectorized versions, by column stacking, of original 2D original and distorted images, respectively. n is the noise term, which is mostly assumed to be zero mean, additive, and independent Gaussian. Generally m < n and thus the problem is illposed. To solve the problem assertion of a prior on the original image is necessary. The early approaches used least square (LS) [21] and Tikhonov regularization [22] as priors. Later minimal total variation (TV) solution [23] and sparse priors [3] were used successfully on this problem. Our focus in the current work is to improve algorithms, in terms of visual quality, that assert sparsity prior on the solution in term of a dictionary domain.
Sparsity prior has been used successfully to solve different inverse problems in image processing [3,5,24,25]. If our desired signal, x, is sparse enough then it has been shown that the solution to (1) is the one with maximum sparsity which is unique (within some ϵball around x) [26,27]. It can be easily found by solving a linear programming problem or by orthogonal matching pursuit (OMP). Not all natural signals are sparse but a wide range of natural signals can be represented sparsely in terms of a dictionary and this makes it possible to use sparsity prior on a wide range of inverse problems. One major problem is that the image signals are considered to be high dimensional data and thus, solving (1) directly is computationally expensive. To tackle this problem we assume local sparsity on image patches. Here, it is assumed that all the image patches have sparse representation in terms of a dictionary. This dictionary can be trained over some patches [28].
Central to the process of image restoration, using local sparse and redundant representations, is the solution to the following optimization problems [3,5],
where Y is the observed distorted image, X is the unknown output restored image, R_{ij }is a matrix that extracts the (ij) block from the image, Ψ ∈ ℝ^{n x k }is the dictionary with k > n, α_{ij }is the sparse vector of coefficients corresponding to the (ij) block of the image, is the estimated image, λ is the regularization parameter, and W is the image obtained by averaging the blocks obtained using the sparse coefficients vectors calculated by solving optimization problem in (2). This is a local sparsitybased method that divides the whole image into blocks and represents each block sparsely using some trained dictionary Among other advantages, one major advantage of such a method is the ease to train a small dictionary as compared to one large global dictionary This is achieved with the help of (2) which is equivalent to (4). As to the coefficients μ_{ij}, those must be location dependent, so as to comply with a set of constraints of the form . Solving this using the orthonormal matching pursuit [29] is easy, gathering one atom at a time, and stopping when the error goes below T. This way, the choice of μ_{ij }has been handled implicitly Equation (3) applies a global constraint on the reconstructed image and uses the local patches and the noisy image as input in order to construct the output that complies with localsparsity and also lies within the proximity of the distorted image which is defined by amount and type of distortion.
In (3), we have assumed that the distortion operator Φ in (1) may be represented by the product DH, where H is a blurring filter and D the downsampling operator. Here we have assumed each nonoverlapping patch of the images can be represented sparsely in the domain of Ψ. Assuming this prior on each patch (2) refers to the sparse coding of local image patches with bounded prior, hence building a local model from sparse representations. This enables us to restore individual patches by solving (2) for each patch. By doing so, we face the problem of blockiness at the patch boundaries when denoised nonoverlapping patches are placed back in the image. To remove these artifacts from the denoised images overlapping patches are extracted from the noisy image which are combined together with the help of (3). The solution of (3) demands the proximity between the noisy image, Y, and the output image X, thus enforcing the global reconstruction constraint. The optimal solution suggests to take the average of the overlapping patches [3], thus eliminating the problem of blockiness in the denoised image.
As stated earlier, we propose a modified restoration method which incorporates SSIM into the procedure defined by (2) and (3). It is defined as follows,
where S(·,·) defines the SSIM measure. The expression for SSIM index is
with μ_{a }and μ_{y }the means of a and y respectively, and the sample variances of a and y respectively, and σ_{ay }the covariance between a and y. The constants C_{1 }and C_{2 }are stabilizing constants and account for the saturation effect of the HVS.
Equation (5) aims to provide the best approximation of a local patch in SSIMsense with the help of minimum possible number of atoms. The process is performed locally for each block in the image which are then combined together by simple averaging to construct W. Equation (6) applies a global constraint and outputs the image that is the best compromise between the noisy image, Y, and W in SSIMsense. This step is very vital because it has been observed that the image W lacks the sharpness in the structures present in the image. Due to the masking effect of the HVS, same level of noise does not distort different visual content equally. Therefore, the noisy image is used to borrow the content from its regions which are not convoluted severely by noise. Use of SSIM is very wellsuited for such a task, as compared to MSE, because it accounts for the masking effect of HVS and allows us to capture improve structural details with the help of the noisy image. Note the use of 1  S(·, ·) in (5). This is motivated by the fact that 1  S(·,·) is a squared variancenormalized distance [30]. Solutions to the optimization problems in (5) and (6) are given in Sections 2.2 and 2.3, respectively.
2.2 SSIMoptimal local model from sparse representation
This section discusses the solution to the optimization problem in (5). Equation (2) can be solved approximately using OMP [29] by including one atom at a time and stopping when the error goes below Tmse = (Cσ)^{2}. C is the noise gain and σ is the standard deviation of the noise. We solve the optimization problem in (5) based on the same philosophy We gather one atom at a time and stop when S(Ψα,x_{ij}) goes above T_{ssim}, threshold defined in terms of SSIM. In order to obtain T_{ssim}, we need to consider the relationship between MSE and SSIM. For the mean reduced a and y, the expression of SSIM reduces to the following equation
Subtracting both sides of (8) from 1 yields
(12)
Equation (12) can be rearranged to arrive at the following result
With the help of the equation above, we can calculate the value of T_{ssim }as follows
where C_{2 }is the constant originally used in SSIM index expression [8] and is calculated based on current approximation of the block given by a: = Ψα.
It has already been shown that the main difference between SSIM and MSE is the divisive normalization [30,31]. This normalization is conceptually consistent with the light adaptation (also called luminance masking) and contrast masking effect of HVS. It has been recognized as an efficient perceptually and statistically nonlinear image representation model [32,33]. It is shown to be a useful framework that accounts for the masking effect in human visual system, which refers to the reduction of the visibility of an image component in the presence of large neighboring components [34,35]. It has also been found to be powerful in modeling the neuronal responses in the visual cortex [36,37]. Divisive normalization has been successfully applied in IQA [38,39], image coding [40], video coding [31] and image denoising [41].
Equation (14) suggests that the threshold is chosen adaptively for each patch. The set of coefficients α = (α_{1}, α_{2}, α_{3},..., α_{k}) should be calculated such that we get the best approximation a in terms of SSIM. We search for the stationary points of the partial derivatives of S with respect to α. The solution to this problem for orthogonal set of basis is discussed in [30]. Here we aim to solve a more general case of linearly independent atoms. The based optimal coefficients, , can be calculated by solving the following system of equations
We denote the inner product of a signal with the constant signal (1/n, 1/n,..., 1/n) of length n by < ψ >: = < ψ, 1/n >, where < ·, · > represents the inner product.
First, we write the mean, the variance and the covariance of a in terms of α with n the size of the current block:
where < · > represents the sample mean. The partial derivatives are given as follows
The structural similarity can be written as
From logarithmic differentiation of (7) combined with (19)(21), we have
After subtracting the corresponding DC values from all the blocks in the image, we are interested only in the particular case where the atoms are made of oscillatory functions, i.e., when 〈ψ_{i}〉 = 0 for 1 ≤ i ≤ k, thus reducing (23) to
We equate (24) to zero in order to find the stationary points. The result is the following linear system of equations
where
where β is an unknown constant dependent on the statistics of the unknown image block a. Comparing α with the optimal coefficients in sense denoted by c and given by (15) results in the following solution:
which implies that the optimal SSIMbased solution is just a scaling of the optimal based solution. The last step is to find β. It is important to note that the value of β varies over the image and is therefore content dependent. Also, the scaling factor, β, may lead to selection of a different set of atoms from the dictionary, as compared to where β = 1, which are better suited to providing a closer and sparser approximation of the patch in SSIMsense. After substituting (27) in the expression (26) for β via (16), (17) and (18) and then isolating for β gives us the following quadratic equation
where
Solving for β and picking a positive value for maximal SSIM gives us
Now we have all the tools required for an OMP algorithm that perform the sparse coding stage in optimal SSIM sense. The modified OMP pursuit algorithm is explained in Algorithm 1. There are two main differences between the OMP algorithm [29] and the one proposed in this work. First, the stopping criterion is based on SSIM. Unlike MSE, SSIM is adaptive according to the reference image. In particular, if the distortion is consistent with the underlying reference e.g., contract enhancement, the distortion is nonstructural and is much less objectional than structural distortions. Defining the stopping criterion according to SSIM essentially means that we are modifying the set of accepted points (image patches) around the noisy image patch which can be represented as the linear combination of dictionary atoms. This way, in the space of image patches, we are omitting image patches in the direction of structural distortion and including the ones which are in the same direction as the original image patch in the set of acceptable image patches. Therefore, we can expect to see more structures in the image constructed using sparsity as a prior. Second, we calculate the SSIMoptimal coefficients from the optimal coefficients in sense using the derivation in Section 2.2, which are scalar multiple of the optimal based coefficients.
2.3 SSIMbased global reconstruction
The solution to this optimization problem defined in Equation (6) is the image that is the best compromise between the distorted image and the one obtained using sparse representation in the maximal SSIM sense. With the assumption of known dictionary, the only other thing the optimization problem in (6) requires is the coefficients α_{ij }which can be obtained by solving optimization problem in (5). SSIM is a local quality measure when it is applied using a sliding window, it provides us with a quality map that reflects the variation of local quality over the whole image. The global SSIM is computed by pooling (averaging) the local SSIM map. The global SSIM for an image, Y, with respect to the reference image, X, is given by the following equation
where x_{ij }= R_{ij}X and y_{ij }= R_{ij}Y where R_{ij }is an N_{w }× N matrix that extracts the (ij) block from the image. The expression for local SSIM, S(x_{ij}, y_{ij}), is given by (7). N_{l }is the total number of local windows and can be calculated as
where tr(·) denotes the trace of a matrix.
We use a gradientdescent approach to solve the optimization problem given by (6). The update equation is given by
where
where N_{w }is the number of pixels in the local image patch, μ_{x}, and σ_{xy }represent the sample mean of x, the sample variance of x, and the sample covariance of x and y, respectively Equation (34) suggests that averaging of the gradients of local patches is to be calculated in order to obtain the global SSIM gradient, and thus the direction and distance of the kth update in . More details regarding the computation of SSIM gradient can be found in [42]. In our experiment, we found this gradient based approach is wellbehaved and it takes only a few iterations for to converge to a stationary point. We initialize as the best MSE solution. Having the gradient of SSIM we follow an iterative procedure to solve (6), assuming the initial value derived from minimal MSE solution.
3 Applications
The framework we proposed provides a general approach that can be used for different applications. To show the effectiveness of our method we will provide two applications: image denoising and superresolution.
3.1 Image denoising
We use the SSIMbased sparse representations framework developed in Sections 2.2 and 2.3 to perform the task of image denoising. The noisecontaminated image is obtained using the following equation
where Y is the observed distorted image, X is the noisefree image and N is additive Gaussian noise. Our goal is to remove the noise from distorted image. Here we train a dictionary, Ψ, for which the original image can be represented sparsely in its domain. We use KSVD method [28] to train the dictionary. In this method the dictionary, which is trained directly over the noisy image and denoising is done in parallel. For a fixed number of iterations, J, we initialize the dictionary by discrete cosine transform (DCT) dictionary. In each step we update the image and then the dictionary. First, based on the current dictionary, sparse coding is done for each patch, and then KSVD is used to update the dictionary (interested reader can refer to [28] for details of dictionary updating). Finally, after doing this procedure J times we execute a global construction stage, following the gradient descend procedure. The proposed image denoising algorithm is summarized in Algorithm 2.
The proposed image denoising scheme is tested on various images with different amount of noise. In all the experiments, the dictionary used was of size 64 × 256, designed to handle patches of 8 × 8 pixels. The value of noise gain, C, is selected to be 1.15 and λ = 30/σ [3]. Table 1 shows the results for images Barbara, Lena, Peppers, House. It also compares the KSVD method [3] with the proposed denoising method. It can be observed that the proposed denoising method achieves better performance in terms of SSIM which is expected to imply better perceptual quality of the denoised image. Figures 2 and 3 show the denoised images using KSVD [3] and the proposed methods along with corresponding SSIM maps. It can be observed that SSIMbased method outperforms specially in the texture region which confirms that the proposed denoising scheme preserves the structures better and therefore has better perceptual image quality.
Table 1. SSIM and PSNR comparisons of image denoising results
Figure 2. Visual comparison of denoising results. (a) Original image; (b) noisy image; (c) SSIMmap of noisy image; (d) KSVDMSE; (e) SSIMmap of KSVDMSE; (f) KSVDSSIM; (g) SSIMmap of KSVDSSIM.
Figure 3. Visual comparison of denoising results. (a) Original image; (b) noisy image; (c) SSIMmap of noisy image; (d) KSVDMSE; (e) SSIMmap of KSVDMSE; (f) KSVDSSIM; (g) SSIMmap of KSVDSSIM.
3.2 Image superresolution
In this section we demonstrate the performance of the SSIMbased sparse representations when used for image superresolution. In this problem, a low resolution image, Y, is given and a high resolution version of the image, X, is required as output. We assume that the low resolution image is produced from high resolution image based on the following equation:
where H represents a blurring matrix, and D is a downsampling matrix. We use local sparsity model as prior to regularize this problem that has infinite many solutions which satisfy (37). Our approach is motivated by recent results in sparse signal representation, which suggests that the linear relationships among highresolution signals can be accurately recovered from their lowdimensional projections. Here, we work with two coupled dictionaries, Ψ_{h }for highresolution patches, and Ψ_{l }for lowresolution ones. The sparse representation of a lowresolution patch in terms of Ψ_{l }will be directly used to recover the corresponding high resolution patch from Ψ_{h }[20]. Given these two dictionaries, each corresponding patch of low resolution image, y, and high resolution image, x, can be represented sparsely with the same coefficient vector, α in Algorithm 2.
The patch from each location of the lowresolution image, that needs to be scaled up, is extracted and sparsely coded with the help of SSIMoptimal Algorithm 1. Once the sparse coefficients, α, are obtained, high resolution patches, y, are computed using (39) which are finally merged by averaging in the overlap area to create the resulting image. The proposed image superresolution algorithm is summarized in Algorithm 3:
The proposed image super resolution scheme is tested on various images. To be consistent with [20] patches of 5 × 5 pixels were used on the low resolution image. Each patch is converted to a vector of length 25. The dictionaries are trained using KSVD [3] with the sizes of 25 × 1024 and 100 × 1024 for the low and the high resolution dictionaries, respectively. 66 natural images are used for dictionary training, which are also used in [43] for similar purpose. To remove artifacts on the patch edges we set overlap of one pixel during patch extraction from the image. Fixed number of atoms (3) has been used by [20] in the sparse coding stage. However SSIMOMP determines the number of atoms adaptively from patch to patch based on its importance considering SSIM measure. In order to calculate the threshold, T_{ssim}, defined in (14), T_{mse }is calculated using MSEbased sparse coding stage in [20]. After calculating sparse representation for all the low resolution patches, we use them to reconstruct the patches and then the difference with the original patch is calculated. We set T_{mse }to the average of these differences. The performance comparison with stateoftheart method is given in Table 2. It can be observed that the proposed algorithm outperforms the other methods consistently in terms of SSIM evaluations. It is also interesting to observe PSNR improvements in some cases, though PSNR is not the optimization goal of the proposed approach. The improvements are not always consistent (for example, PSNR drops in some cases in Table 1, while SSIM always improves). There are complicated reasons behind these results. It needs to be aware that the socalled "MSEoptimal" algorithms include many suboptimal and heuristic steps and thus have potentials to be improved even in the MSE sense. Our methods are different from the "MSEoptimal" methods in multiple stages. Although the differences are made to improve SSIM, they may have positive impact on improving MSE as well. For example, when using the learned dictionary to reconstruct an image patch, if SSIM is used to replace MSE in selecting the atoms in the dictionary, then essentially the set of accepted atoms in the dictionary have been changed. In particular, since SSIM is variance normalized, the set of acceptable reconstructed patches near the noisy patch may be structurally similar but are significantly different in variance. This may lead to different selections of the atoms in the dictionary, which when appropriately scaled to approximate the noisy patch, may result in better reconstruction result. Although the visual and SSIM improvements are only moderate, these are promising results as an initial attempt of incorporating a perceptually more meaningful measure into the optimization problem of KSVDbased superresolution method. Figures 4 and 5 compare the reconstructed images obtained using [5] and the proposed methods for the Raccoon and the Girl images, respectively. It can be seen that the proposed scheme preserves many local structures better and therefore has better perceptual image quality. The visual quality improvement is also reflected in the corresponding SSIM maps, which provide useful guidance on how local image quality is improved over space. It can be observed from the SSIM maps that the areas which are relatively more structured benefit more from the proposed algorithm as the quality measure used is better at calculating the similarity of structures as compared to MSE.
Table 2. SSIM and PSNR comparisons of image superresolution results
Figure 4. Visual comparison of superresolution results. (a) Original image; (b) low resolution image; (c) Yang's method; (d) SSIMmap of Yang's method; (e) proposed method; (f) SSIMmap of proposed method.
Figure 5. Visual comparison of superresolution results. (a) Original image; (b) low resolution image; (c) Yang's method; (d) SSIMmap of Yang's method; (e) proposed method; (f) SSIMmap of proposed method.
4 Conclusions
In this article, we attempt to combine perceptual image fidelity measurement with optimal sparse signal representation in the context of image denoising and image superresolution to improve two stateoftheart algorithms in these areas. We proposed an algorithm to solve for the optimal coefficients for sparse and redundant dictionary in maximal SSIM sense. We also developed a gradient descent approach to achieve the best compromise between the distorted image and the image reconstructed using sparse representation. Our simulations demonstrate promising results and also indicate the potential of SSIM to replace the ubiquitous PSNR/MSE as the optimization criterion in image processing applications. It must be taken into account that this is only an early attempt along a new but promising direction. The main contribution of the current work is mostly in the general framework and theoretical development. Significant improvement in visual quality can be expected by improving the dictionary learning process based on SSIM, as dictionary encapsulates in itself the prior knowledge about the image to be restored. An SSIMoptimal dictionary will capture structures contained in the image in a better way and the restoration task will result into sharper output image. Further improvement is also expected in the future when some of the advanced mathematical properties of SSIM and normalized metrics [12] are incorporated into the optimization framework.
Competing interests
The authors declare that they have no competing interests.
Algorithm 1: SSIMinspired OMP
Initialize: D = {} set of selected atoms, S_{opt }= 0, r = Y
while S_{opt }< T_{ssim}
• Add the next best atom in sense to D
• Find the optimal based coefficient(s) using (15)
• Find the optimal SSIMbased coefficient(s) using (27) and (31)
• Update the residual r
• Find SSIMbased approximation a
• Calculate S_{opt }= S(a, y)
end
Algorithm 2: SSIMinspired image denoising
1. Initialize: X = Y, Ψ = overcomplete DCT dictionary
2. Repeat J times
• Sparse coding stage: use SSIMoptimal OMP to compute the representation vectors α_{ij }for each patch
• Dictionary update stage: Use KSVD [28] to calculate the updated dictionary and coefficients. Calculate
SSIMoptimal coefficients using (27) and (31)
3. Global Reconstruction: Use gradient descent algorithm to optimize (6), where the SSIM gradient is given by (35).
Algorithm 3: SSIMinspired image super resolution
1. Dictionary Training Phase: trained high and low resolution dictionaries Ψ_{l}, Ψ_{h}, [20]
2. Reconstruction Phase
• Sparse coding stage: use SSIMoptimal OMP to compute the representation vectors _{aij}for all the patches of low resolution image
• High resolution patches reconstruction: Reconstruct high resolution patches by Ψ_{h}α_{ij}
3. Global Reconstruction: merge highresolution patches by averaging over the overlapped
region to create the high resolution image.
Acknowledgements
This work was supported in part by the Natural Sciences and Engineering Research Council of Canada and in part by Ontario Early Researcher Award program, which are gratefully acknowledged.
References

K Dabov, A Foi, V Katkovnik, K Egiazarian, Image denoising by sparse 3D transformdomain collaborative filtering. IEEE Trans. Image Process 16, 2080–2095 (2007). PubMed Abstract

A Buades, B Coll, JM Morel, A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2), 490–530 (2005). Publisher Full Text

M Elad, M Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12), 3736–3745 (2006). PubMed Abstract

H Hou, H Andrews, Cubic splines for image interpolation and digital filtering. IEEE Trans Signal Process 26, 508–517 (1978). Publisher Full Text

J Yang, J Wright, T Huang, Y Ma, Image superresolution via sparse representation. IEEE Trans Image Process 19(11), 2861–2873 (2010)

J Yang, J Wright, TS Huang, Y Ma, Image superresolution as sparse representation of raw image patches. Proc IEEE Comput Vis Pattern Recognit, 1–8 (2008)

Z Wang, AC Bovik, Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process Mag 26, 98–117 (2009)

Z Wang, AC Bovik, HR Sheikh, EP Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4), 600–612 (2004). PubMed Abstract  Publisher Full Text

Joint Video Team (JVT) Reference Software [Online], [http://iphome.hhi.de/suehring/tml/download/old_jm] webcite

Y Gao, A Rehman, Z Wang, CWSSIM Based image classification. IEEE International Conference on Image Processing ICIP (Brussels, Belgium, 2011), pp. 1249–1252

G Piella, H Heijmans, A new quality metric for image fusion. IEEE International Conference on Image Processing (ICIP) (Barcelona, Spain, 2003) 3, pp. 173–176

D Brunet, ER Vrscay, Z Wang, On the Mathematical Properties of the Structural Similarity Index (Preprint) (University of Waterloo, Waterloo, 2011), [http://www.math.uwaterloo.ca/~dbrunet/] webcite

SS Channappayya, AC Bovik, C Caramanis, R Heath, Design of linear equalizers optimized for the structural similarity index. IEEE Trans Image Process 17(6), 857–872 (2008). PubMed Abstract  Publisher Full Text

Z Wang, Q Li, X Shang, Perceptual image coding based on a maximum of minimal structural similarity criterion. IEEE Int Conf Image Process 2, II121–II124 (2007)

A Rehman, Z Wang, SSIMbased nonlocal means image denoising. IEEE International Conference on Image Processing (ICIP) (Brussels, Belgium, 2011), pp. 1–4

S Wang, A Rehman, Z Wang, S Ma, W Gao, RateSSIM optimization for video coding. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 11) (Prague, Czech Republic, 2011), pp. 833–836

T Ou, Y Huang, H Chen, A perceptualbased approach to bit allocation for H.264 encoder. SPIE Visual Communications and Image Processing, 77441B (2010)

Z Mai, C Yang, K Kuang, L Po, A novel motion estimation method based on structural similarity for h.264 inter prediction. IEEE Int Conf Acoust Speech Signal Process (Toulouse, 2006) 2, pp. 913–916

C Yang, H Wang, L Po, Improved inter prediction based on structural similarity in H.264. IEEE Int Conf Signal Process Commun (Dubai, 2007) 2, pp. 340–343

R Zeyde, M Elad, M Protter, On single image scaleup using sparserepresentations. Curves & Surfaces (AvignonFrance, 2010), pp. 711–730 PubMed Abstract

A Savitzky, MJE Golay, Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36, 1627–1639 (1964). Publisher Full Text

AN Tikhonov, VY Arsenin, Solutions of IllPosed Problem (V. H. Winston, Washington DC, 1977)

LI Rudin, S Osher, E Fatemi, Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992). Publisher Full Text

M Protter, M Elad, Image sequence denoising via sparse and redundant representations. IEEE Trans Image Process 18, 27–35 (2009). PubMed Abstract  Publisher Full Text

J Mairal, G Sapiro, M Elad, Learning multiscale sparse representations for image and video restoration. Multiscale Model Simul 7, 214–241 (2008). Publisher Full Text

EJ Candés, J Romberg, T Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2), 489–509 (2006)

DL Donoho, Compressed sensing. IEEE Trans Inf Theory 52(4), 1289–1306 (2006)

M Aharon, M Elad, A Bruckstein, KSVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11), 4311–4322 (2006)

Y Pati, R Rezaiifar, P Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Twenty Seventh Asilomar Conference on Signals, Systems and Computers (Pacific Grove, CA, 1993) 1, pp. 40–44

D Brunet, ER Vrscay, Z Wang, Structural similaritybased approximation of signals and images using orthogonal bases. in Proc Int Conf on Image Analysis and Recognition, ed. by M Kamel, A Campilho (Springer, Heidelberg, 2010), pp. 11–22 (vol, 2010), . 6111 of LNCS

S Wang, A Rehman, Z Wang, S Ma, W Gao, SSIMinspired divisive normalization for perceptual video coding. IEEE International Conference on Image Processing ICIP (Brussels, Belgium, 2011), pp. 1657–1660

MJ Wainwright, EP Simoncelli, Scale mixtures of gaussians and the statistics of natural images. Adv Neural Inf Process Syst 12, 855–861 (2000)

S Lyu, EP Simoncelli, Statistically and perceptually motivated nonlinear image representation. Proc SPIE Conf Human Vision Electron Imaging XII (San Jose, CA, 2007) 6492, pp. 6492071–64920715

J Foley, Human luminance pattern mechanisms: masking experiments require a new model. J Opt Soc Am 11, 1710–1719 (1994). Publisher Full Text

AB Watson, JA Solomon, Model of visual contrast gain control and pattern masking. J Opt Soc Am 14, 2379–2391 (1997). Publisher Full Text

DJ Heeger, Normalization of cell responses in cat striate cortex. Vis Neural Sci 9, 181–198 (1992)

EP Simoncelli, DJ Heeger, A model of neuronal responses in visual area MT. Vis Res 38, 743–761 (1998). PubMed Abstract  Publisher Full Text

Q Li, Z Wang, Reducedreference image quality assessment using divisive normalizationbased image representation. IEEE J Coupled dictionary training for image s Spec Top Signal Process 3, 202–211 (2009)

A Rehman, Z Wang, Reducedreference SSIM estimation. International Conference on Image Processing (Hong Kong, China, 2010), pp. 289–292

J Malo, I Epifanio, R Navarro, EP Simoncelli, Nonlinear image representation for efficient perceptual coding. IEEE Trans Image Process 15, 68–80 (2006). PubMed Abstract

J Portilla, V Strela, MJ Wainwright, EP Simoncelli, Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans Image Process 12, 1338–1351 (2003). PubMed Abstract  Publisher Full Text

Z Wang, EP Simoncelli, Maximum differentiation (MAD) competition: a methodology for comparing computational models of perceptual quantities. J Vis 8(12), 1–13 (2008). PubMed Abstract  Publisher Full Text

J Yang, Z Wang, Z Lin, T Huang, Coupled dictionary training for image superresolution. [http://www.ifp.illinois.edu/~jyang29/] webcite