Abstract
A practical problem addressed recently in computational photography is that of producing a good picture of a poorly lit scene. The consensus approach for solving this problem involves capturing two images and merging them. In particular, using a flash produces one (typically high signaltonoise ratio [SNR]) image and turning off the flash produces a second (typically low SNR) image. In this article, we present a novel approach for merging two such images. Our method is a generalization of the guided filter approach of He et al., significantly improving its performance. In particular, we analyze the spectral behavior of the guided filter kernel using a matrix formulation, and introduce a novel iterative application of the guided filter. These iterations consist of two parts: a nonlinear anisotropic diffusion of the noisier image, and a nonlinear reactiondiffusion (residual) iteration of the less noisy one. The results of these two processes are combined in an unsupervised manner. We demonstrate that the proposed approach outperforms stateoftheart methods for both flash/noflash denoising, and deblurring.
1 Introduction
Recently, several techniques [15] to enhance the quality of flash/noflash image pairs have been proposed. While the flash image is better exposed, the lighting is not soft, and generally results in specularities and unnatural appearance. Meanwhile, the noflash image tends to have a relatively low signaltonoise ratio (SNR) while containing the natural ambient lighting of the scene. The key idea of flash/noflash photography is to create a new image that is closest to the look of the real scene by having details of the flash image while maintaining the ambient illumination of the noflash image. Eisemann and Durand [3] used bilateral filtering [6] to give the flash image the ambient tones from the noflash image. On the other hand, Petschnigg et al. [2] focused on reducing noise in the noflash image and transferring details from the flash image to the noflash image by applying joint (or cross) bilateral filtering [3]. Agrawal et al. [4] removed flash artifacts, but did not test their method on noflash images containing severe noise. As opposed to a visible flash used by [24], recently Krishnan and Fergus [7] used both nearinfrared and nearultraviolet illumination for low light image enhancement. Their socalled "dark flash" provides highfrequency detail in a less intrusive way than a visible flash does even though it results in incomplete color information. All these methods ignored any motion blur by either depending on a tripod setting or choosing sufficiently fast shutter speed. However, in practice, the captured images under lowlight conditions using a handheld camera often suffer from motion blur caused by camera shake.
More recently, Zhuo et al. [5] proposed a flash deblurring method that recovers a sharp image by combining a blurry image and a corresponding flash image. They integrated a socalled flash gradient into a maximumaposteriori framework and solved the optimization problem by alternating between blur kernel estimation and sharp image reconstruction. This method outperformed many statesoftheart single image deblurring [810] and color transfer methods [11]. However, the final output of this method looks somewhat blurry because the model only deals with a spatially invariant motion blur.
Others have used multiple pictures of a scene taken at different exposures to generate high dynamic range images. This is called multiexposure image fusion [12] which shares some similarity with our problem in that it seeks a new image that is of better quality than any of the input images. However, the flash/noflash photography is generally more difficult due to the fact that there are only a pair of images. Enhancing a low SNR noflash image with a spatially variant motion blur only with the help of a single flash image is still a challenging open problem.
2 Overview of the proposed approach
We address the problem of generating a high quality image from two captured images: a flash image (Z ) and a noflash image (Y ; Figure 1). We treat these two images, Z and Y , as random variables. The task at hand is to generate a new image (X) that contains the ambient lighting of the noflash image (Y ) and preserves the details of the flashimage (Z ). As in [2], the new image X can be decomposed into two layers: a base layer and a detail layer;
Figure 1. Flash/noflash pairs. Noflash image can be noisy or blurry.
Here, Y might be noisy or blurry (possibly both), and
where G(·) is the guided filtering (LMMSE) operator,
In fact,
where
Figure 2. System overview. Overview of our algorithm for flash/noflash enhancement.
A preliminary version [13] of this article is appeared in the IEEE International Conference on Computer Vision (ICCV '11) workshop. This article is different from [13] in the following respects:
(1) We have provided a significantly expanded statistical derivation and description of the guided filter and its properties in Section 3 and Appendix.
(2) Figures 3 and 4 are provided to support the key idea of iterative guided filtering.
Figure 3. Iteration improves accuracy. Accuracy of
Figure 4. Effect of iteration. As opposed to
(3) We provide many more experimental results for both flash/noflash denoising and de blurring in Section 5.
(4) We describe the key ideas of diffusion and residual iteration and their novel relevance to iterative guided filtering in the Appendix.
(5) We prove the convergence of the proposed iterative estimator in the Appendix.
(6) As supplemental material, we share our project website^{c }where flash/noflash relighting examples are also presented.
In Section 3, we outline the guided filter and study its statistical properties. We describe how we actually estimate the linear model coefficients a, b, c, d and α, β, and we provide an interpretation of the proposed iterative framework in matrix form in Section 4. In Section 5, we demonstrate the performance of the system with some experimental results, and finally we conclude the article in Section 6.
3 The guided filter and its properties
In general, spacevariant, nonparametric filters such as the bilateral filter [6], nonlocal means filter [14], and locally adaptive regression kernels filter [15] are estimated from the given corrupted input image to perform denoising. The guided
filter can be distinguished from these in the sense that the filter kernel weights
are computed from a (second) guide image which is presumably cleaner. In other words,
the idea is to apply filter kernels W_{ij }computed from the guide (e.g., flash) image Z to the more noisy (e.g., noflash) image Y. Specifically, the filter output sample
Note that the filter kernel W_{ij }is a function of the guide image Z, but is independent of Y. The guided filter kernel^{e }can be explicitly written as
where ω is the total number of pixels (= p^{2}) in ω_{k}, ε is a global smoothing parameter,
Figure 5. Examples of guided filter kernel weights in four different patches. The kernel weights well represent underlying structures.
Next, we study some fundamental properties of the guided filter kernel in matrix form.
We adopt a convenient vector form of Equation 5 as follows:
where y is a column vector of pixels in Y and
where z is a vector of pixels in Z and W is only a function of z. The filter output can be analyzed as the product of a matrix of weights W with the vector of the given the input image y.
The matrix W is symmetric as shown in Equation 8 and the sum of each row of W is equal to one (W1_{N }= 1_{N }) by definition. However, as seen in Equation 6, the definition of the weights does not necessarily imply that the elements of the matrix W are positive in general. While this is not necessarily a problem in practice, we find it useful for our purposes to approximate this kernel with a proper admissible kernel [17]. That is, for the purposes of analysis, we approximate W as a positive valued, symmetric positive definite matrix with rows summing to one, as similarly done in [18]. For the details, we refer the reader to the Appendix A.
With this technical approximation in place, all eigenvalues λ_{i }(i = 1, ..., N) are real, and the largest eigenvalue of W is exactly one (λ_{1 }= 1), with corresponding eigenvector
Figure 6. Examples of W in one patch of size 25 × 25. All eigenvalues of the matrix W are nonnegative, thus the guided filter kernel matrix is positive definite matrix. The largest eigenvalue of W is one and the rank of W asymptotically becomes one. This figure is better viewed in color.
So u1 summarizes the asymptotic effect of applying the filter W many times. Figure 7 shows what a typical u_{1 }looks like.
Figure 7. Examples of the first left eigenvector u in three patches. The vector was reshaped into an image for illustration purpose.
Figure 8 shows examples of the (center) row vector (w^{T}) from W's powers in one patch of size 25 × 25. The vector was reshaped into an image for illustration purposes. We can see that powers of W provide even better structure by generating larger (and more sophisticated) kernels. This insight reveals that applying W multiple times can improve the guided filtering performance, which leads us to the iterative use of the guided filter. This approach will produce the evolving coefficients α_{n}, β_{n }introduced in (4). In the following section, we describe how we actually compute these coefficients based on Bayesian mean square error (MSE) predictions.
Figure 8. The guided filter kernel matrix. The guided filter kernel matrix W captures the underlying data structure, but powers of W provides even better structure by generating larger (but more sophisticated) kernel shapes. w is the (center) row vector of W. w was reshaped into an image for illustration purposes.
4 Iterative application of local LMMSE predictors
The coefficients^{g }a_{k}, b_{k}, c_{k}, d_{k }in (3) are chosen so that "on average" the estimated value
where ε_{1 }and ε_{2 }are small constants that prevent
where we compute
Note that the use of different ω_{k }results in different predictions of these coefficients. Hence, one must compute an
aggregate estimate of these coefficients coming from all windows that contain the
pixel of interest. As an illustration, consider a case where we predict
Figure 9. Explanation of LMMSE.
As such, the resulting prediction of
The idea of using these averaged coefficients
These local linear models work well when the window size p is small and the underlying data have a simple pattern. However, the linear models
are too simple to deal effectively with more complicated structures, and thus there
is a need to use larger window sizes. As we alluded to earlier, the estimation of
these linear coefficients in an iterative fashion can deal well with more complex
behavior of the image content. More specifically, by initializing
where n is the iteration number and τ_{n }>0 is set to be a monotonically decaying function^{k }of n such that
Recall that Equation 14 can also be written in matrix form as done in Section 3:
where W and W_{d }are guided filter kernel matrices composed of the guided filter kernels W and W_{d }respectively.^{l }Explicitly writing the iterations, we observe
where P_{n }is a polynomial function of W. The blockdiagram in Figure 2 can be redrawn in terms of the matrix formulation as shown in Figure 10. The first term
Figure 10. Diagram of the proposed iterative approach in matrix form. Note that the iteration can be divided into two parts: diffusion and residual iteration process.
5 Experimental results
In this section, we apply the proposed approach to flash/noflash image pairs for denoising and deblurring. We convert images Z and Y from RGB color space to CIE Lab, and perform iterative guided filtering separately in each resulting channel. The final result is converted back to RGB space for display. We used the implementation of the guided filter [1] from the author's website.^{o }All figures in this section are best viewed in color.^{p}
5.1 Flash/noflash denoising
5.1.1 Visible flash [2]
We show experimental results on a couple of flash/noflash image pairs where noflash images suffer from noise^{q}. We compare our results with the method based on the joint bilateral filter [2] in Figures 11, 12, and 13. Our proposed method effectively denoised the noflash image while transferring the fine detail of the flash image and maintaining the ambient lighting of the noflash image. We point out that the proposed iterative application of the guided filtering in terms of diffusion and residual iteration yielded much better results than one application of either the joint bilateral filtering [2] or the guided filter [1].
Figure 11. Flash/noflash denoising example compared to the state of the art method [2]. The iteration n for this example is 10.
Figure 12. Flash/noflash denoising example compared to the state of the art method [2]. The iteration n for this example is 10.
Figure 13. Flash/noflash denoising example compared to the state of the art method [2]. The iteration n for this example is 2.
5.1.2 Dark flash [7]
In this section, we use the dark flash method proposed in [7]. Let us call the dark flash image Z. Dark flash may introduce shadows and specularities in images, which affect the results
of both the denoising and detail transfer. We detect those regions using the same
methods proposed by [2]. Shadows are detected by finding the regions where Z  Y is small, and specularities are found by detecting saturated pixels in Z. After combining the shadow and specularities mask, we blur it using a Gaussian filter
to feather the boundaries. By using the resulting mask, the output
Figure 14. Dark flash/noflash (low noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 15. Dark flash/noflash (mid noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 16. Dark flash/noflash (high noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 17. Dark flash/noflash (low noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 18. Dark flash/noflash (mid noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 19. Dark flash/noflash (high noise) denoising example compared to the stateoftheart method [7]. The iteration n for this example is 10.
5.2 Flash/noflash deblurring
Motion blur due to camera shake is an annoying yet common problem in lowlight photography. Our proposed method can also be applied to flash/noflash deblurring^{r}. Here, we show experimental results on a couple of flash/noflash image pairs where noflash images suffer from mild noise and strong motion blur. We compare our method with Zhuo et al. [5]. As shown in Figures 20, 21, 22, 23, and 24, our method outperforms the method by [5], obtaining much finer details with better color contrast even though our method does not estimate a blur kernel at all. The results by Zhuo et al. [5] tend to be somewhat blurry and distort the ambient lighting of the real scene. We point out that we only use a single blurred image in Figure 24 while Zhuo et al. [5] used two blurred images and one flash image.
Figure 20. Flash/noflash deblurring example compared to the stateoftheart method [5]. The iteration n for this example is 20.
Figure 21. Flash/noflash deblurring example compared to the stateoftheart method [5]. The iteration n for this example is 20.
Figure 22. Flash/noflash deblurring example compared to the stateoftheart method [5]. The iteration n for this example is 20.
Figure 23. Flash/noflash deblurring example compared to the stateoftheart method [5]. The iteration n for this example is 5.
Figure 24. Flash/noflash deblurring example compared to the stateoftheart method [5]. The iteration n for this example is 20.
6 Summary and future work
The guided filter has proved to be more effective than the joint bilateral filter in several applications. Yet we have shown that it can be improved significantly more still. We analyzed the spectral behavior of the guided filter kernel using a matrix formulation and improved its performance by applying it iteratively. Iterations of the proposed method consist of a combination of diffusion and residual iteration. We demonstrated that the proposed approach yields outputs that not only preserve fine details of the flash image, but also the ambient lighting of the noflash image. The proposed method outperforms stateoftheart methods for flash/noflash image denoising and deblurring. It would be interesting to see if the performance of other nonparametric filer kernels such as bilateral filters and locally adaptive regression kernels [15] can be further improved in our iterative framework. It is also worthwhile to explore several other applications such as joint upsampling [22], image matting [23], mesh smoothing [24,25], and specular highlight removal [26] where the proposed approach can be employed.
Appendix
Positive definite and symmetric rowstochastic approximation of W
In this section, we describe how we approximate W with a symmetric, positive definite, and rowstochastic matrix. First, as we mentioned earlier, the matrix W can contain negative values as shown in Figure 3. We employ the Taylor series approximation (exp (t) ≈ 1 + t) to ensure that W has both positive elements, and is positivedefinite. To be more concrete, consider a simple example; namely, a local patch of size 5 × 5 as follows:
In this case, we can have up to 9 ω_{k }of size ω = 3 × 3. Let
Next, we convert the matrix W (composed of strictly positive elements now) to a doublystochastic, symmetric, positive definite matrix as again done in [18]. The algorithm we use to effect this approximation is due to Sinkhorn [27,28], who proved that given a matrix with strictly positive elements, there exist diagonal matrices R = diag(r) and C = diag(c) such that
is doubly stochastic. That is,
Furthermore, the vectors r and c are unique to within a scalar (i.e., α r, c/α.) Sinkhorn's algorithm for obtaining r and c in effect involves repeated normalization of the rows and columns (see Algorithm 1 for details) so that they sum to one, and is provably convergent and optimal in the crossentropy sense [29].
Algorithm 1 Algorithm for scaling a matrix A to a nearby doublystochastic matrix
Given a matrix A, let (N, N) be size(A) and initialize r = ones(N, 1);
for k = 1 : iter;
c = 1./(A^{T }r);
r = 1./(A c);
end
C = diag(c); R = diag(r);
Diffusion and residual iteration
Diffusion
Here, we describe how multiple direct applications of the filter given by W is in effect equivalent to a nonlinear anisotropic diffusion process [20,30]. We define
From the iteration (19) we have
which we can rewrite as
where
Residual iteration
An alternative to repeated applications of the filter W is to consider the residual signals, defined as the difference between the estimated signal and the measured signal. This results in a variation of the diffusion estimator which uses the residuals as an additional forcing term. The net result is a type of reactiondiffusion process [31]. In statistics, the use of the residuals in improving estimates has a rather long history, dating at least back to the study of Tukey [21] who termed the idea "twicing". More recently, the idea has been suggested in the applied mathematics community under the rubric of Bregman iterations [32], and in the machine learning and statistics literature [33] as L_{2}boosting.
Formally, the residuals are defined as the difference between the estimated signal
and the measured signal:
Explicitly writing the iterations, we observe:
where F_{n }is a polynomial function of W of order n + 1. The first iterate
Convergence of the proposed iterative estimator
Recall the iterations:
where P_{n }is a polynomial function of W. The first (direct diffusion) term in the iteration is clearly stable and convergent as described in (9). Hence, we need to show the stability of the second part. To do this, we note that the spectrum of P_{n }can be written, and bounded in terms of the eigenvalues λ_{i }of W as follows:
where the inequality follows from the knowledge that 0 ≤ λ_{N }≤ ... λ_{3 }≤ λ_{2 }<λ_{1 }= 1. Furthermore, in Section 4 we defined τ_{n }to be a monotonically decreasing sequence such that
End notes
^{a}More detail is provided in Section 4. ^{b}The guided filter [1] reduces noise while preserving edges as bilateral filter [6] does. However, the guided filter outperforms the bilateral filter by avoiding the
gradient reversal artifacts that may appear in such applications as detail enhancement,
high dynamic range (HDR) compression, and flash/noflash denoising. ^{c}http://users.soe.ucsc.edu/~milanfar/research/rokaf/.html/IGF/ webcite. ^{d }
^{m }This is generally defined as the difference between the estimated signal
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
HS carried out the design of iterative guided filtering and drafted the manuscript. PM participated in the design of iterative guided filtering and performed the statistical analysis. All authors read and approved the final manuscript.
Acknowledgements
We thank Dilip Krishnan for sharing the postprocessing code [7] for darkflash examples. This study was supported by the AFOSR Grant FA 955007010365 and NSF Grant CCF1016018. This study was done while the first author was at the University of California.
References

K He, J Sun, X Tang, Guided image filtering. Proceedings of European Conference Computer Vision (ECCV) (2010)

G Petschnigg, M Agrawala, H Hoppe, R Szeliski, M Cohen, K Toyama, Digital photography with flash and noflash image pairs. ACM Trans Graph 21(3), 664–672 (2004)

E Eisemann, F Durand, Flash photography enhancement via intrinsic relighting. ACM Trans Graph 21(3), 673–678 (2004)

A Agrawal, R Raskar, S Nayar, Y Li, Removing photography artifacts using gradient projection and flashexposure sampling. ACM Trans Graph 24, 828–835 (2005). Publisher Full Text

S Zhuo, D Guo, T Sim, Robust flash deblurring. IEEE Conference on Computer Vison and Pattern Recognition (2010)

C Tomasi, R Manduchi, Bilateral Filtering for Gray and Color Images. Proceedings of the 1998 IEEE International Conference of Compute Vision Bombay, India, 836–846 (1998)

D Krishnan, R Fergus, Dark flash photography. ACM Trans Graph 28(4), 594–611 (2009)

Q Shan, J Jia, MS Brown, Globally optimized linear windowed tonemapping. IEEE Trans Vis Comput Graph 16(4), 663–675 (2010). PubMed Abstract  Publisher Full Text

R Fergus, B Singh, A Hertsmann, ST Roweis, WT Freeman, Removing camera shake from a single image. ACM Trans Graph (SIGGRAPH) 25, 787–794 (2006). Publisher Full Text

L Yuan, J Sun, L Quan, HY Shum, Progressive interscale and intrascale nonblind image deconvolution. ACM Trans Graph 27(3), 1–10 (2008)

YW Tai, J Jia, CK Tang, Local color transfer via probabilistic segmentation by expectationmaximization. IEEE Conference on Computer Vison and Pattern Recognition (2005)

W Hasinoff, Variableaperture photography (PhD Thesis, Department of Computer Science, University of Toronto, 2008)

H Seo, P Milanfar, Computational photography using a pair of flash/noflash images by iterative guided filtering. IEEE International Conference on Computer Vision (ICCV) (2011) Submitted

A Buades, B Coll, JM Morel, A review of image denoising algorithms, with a new one. Multiscale Model Simulat (SIAM) 4(2), 490–530 (2005)

H Takeda, S Farsiu, P Milanfar, Kernel regression for image processing and reconstruction. IEEE Trans Image Process 16(2), 349–366 (2007). PubMed Abstract

A Buades, B Coll, JM Morel, Nonlocal image and movie denoising. Int J Comput Vis 76(2), 123–139 (2008). Publisher Full Text

T Hofmann, B Scholkopf, AJ Smola, Kernel methods in machine learning. Ann Stat 36(3), 1171–1220 (2008). Publisher Full Text

P Milanfar, A tour of modern image processing.. IEEE Signal Process Mag (2011) [http:/ / users.soe.ucsc.edu/ ~milanfar/ publications/ journal/ ModernTour_FinalSubmission.pdf] webcite

M Protter, M Elad, H Takeda, P Milanfar, Generalizing the nonlocalmeans to superresolution reconstruction. IEEE Trans Image Process 18, 36–51 (2009). PubMed Abstract  Publisher Full Text

P Perona, J Malik, Scalespace and edge detection using anistropic diffusion. IEEE Trans Pattern Anal Mach Intell 12(9), 629–639 (1990)

JW Tukey, Exploratory Data Analysis (Addison Wesley, Reading, MA, 1977)

J Kopf, MF Cohen, D Lischinski, M Uyttendaele, Joint bilateral upsampling. ACM Trans Graph 26(3), 96 (2007). Publisher Full Text

K He, J Sun, X Tang, Fast matting using large kernel matting Laplacian matrices. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

S Fleishman, I Drori, D CohenOr, Bilateral mesh denoising. ACM Trans Graph 22(3), 950–953 (2003). Publisher Full Text

T Jones, F Durand, M Desbrun, Noniterative feature preserving mesh smoothing. ACM Trans Graph 22(3), 943–949 (2003). Publisher Full Text

Q Yang, S Wang, N Ahuja, Realtime specular highlight removal using bilateral filtering. ECCV (2010)

PA Knight, The SinkhornKnopp algorithm: convergence and applications. SIAM J Matrix Anal Appl 30, 261–275 (2008). Publisher Full Text

R Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann Math Stat 35(2), 876–879 (1964). Publisher Full Text

J Darroch, D Ratcliff, Generalized iterative scaling for loglinear models. Ann Math Stat 43, 1470–1480 (1972). Publisher Full Text

A Singer, Y Shkolinsky, B Nadler, Diffusion interpretation of nonlocal neighborhood filters for signal denoising. SIAM J Imaging Sci 2, 118–139 (2009). Publisher Full Text

G Cottet, L Germain, Image processing through reaction combined with nonlinear diffusion. Math Comput 61, 659–673 (1993). Publisher Full Text

S Osher, M Burger, D Goldfarb, J Xu, W Yin, An iterative regularization method for total variationbased image restoration. Multiscale Model Simulat 4(2), 460–489 (2005). Publisher Full Text

P Buhlmann, B Yu, Boosting with the L_{2 }loss: regression and classification. J Am Stat Assoc 98(462), 324–339 (2003). Publisher Full Text