Wavelet-SRNet: A Wavelet-Based CNN for Multi-Scale Face ...

wavelet -SRNet: A Wavelet-Based CNN for Multi-Scale Face Super ResolutionHuaibo Huang1,2,3, Ran He1,2,3, Zhenan Sun1,2,3and Tieniu Tan1,2,31 School of Engineering Science, University of Chinese Academy of Sciences2 Center for Research on Intelligent Perception and Computing, CASIA3 National Laboratory of Pattern Recognition, znsun, modern face super-resolution methods resortto convolutional neural networks (CNN) to infer high-resolution (HR) face images. When dealing with very lowresolution (LR) images, the performance of these CNNbased methods greatly degrades. Meanwhile, these methodstend to produce over-smoothed outputs and miss some textu-ral details. To address these challenges, this paper presentsa Wavelet-Based CNN approach that can ultra -resolve avery low resolution face image of16 16or smaller pixel-size to its larger version of multiple scaling factors (2 ,4 ,8 and even16 ) in a unified framework.

Different fromconventional CNN methods directly inferring HR images,our approach firstly learns to predict the LR s correspond-ing series of HR s wavelet coefficients before reconstruct-ing HR images from them. To capture both global topologyinformation and local texture details of human faces, wepresent a flexible and extensible convolutional neural net-work with three types of loss: wavelet prediction loss, tex-ture loss and full-image loss. Extensive experiments demon-strate that the proposed approach achieves more appealingresults both quantitatively and qualitatively than state-of-the-art super-resolution IntroductionFace super-resolution (SR), also known as face halluci-nation, refers to reconstructing high resolution (HR) faceimages from their corresponding low resolution (LR) input-s.

It is significant for most face-related applications, recognition, where captured faces are of low resolutionand lack in essential facial details. It is a special case ofsingle image super resolution and many methods have beenproposed to address it. It is a widely known undeterminedinverse problem, , there are various corresponding high-resolution answers to explain a given low-resolution current single image super-resolution methods [2,6, 14, 15, 23] depend on a pixel-wise mean squared er-(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)F igure 1. Illustration of wavelet decomposition and our Wavelet-Based SR. Top row: (a) The original128 128high-resolutionface image and its (b) 1 level, (c) 2 level, (d) 3 level, full waveletpacket decomposition image. Middle row: (h) The16 16low-resolution face image and its (g)2 , (f)4 , (e)8 , upscalingversions inferred by our network.

Bottom row: similar with themiddle row except the low-resolution input (l) is8 (MSE) loss in image space to push the outputs pixel-wise closer to the ground-truth HR images in training , such approaches tend to produce blurry and over-smoothed outputs, lacking some textural details. Besides,they seem to only work well on limited up-scaling factors(2 or4 ) and degrades greatly when ultra -resolving avery small input (like16 16or smaller). Several recent ef-forts [5, 33, 35] have been developed to deal with this issuebased on convolutional neural networks. Dahl et al. [5] usePixelCNN [27] to synthesize realistic details. Yu et al. [33]investigate GAN [8] to create perceptually realistic et al. [35] combine dense correspondence field estima-tion with face super-resolution.

However, the applicationof these methods in super-resolution in image space faces1689many problems, such as computational complexity [5], in-stability in training [33] and poor robustness for pose andocclusion variations [35]. Therefore, due to various prob-lems yet to be solved, image SR remains an open and chal-lenging transform (WT) has been shown to be an effi-cient and highly intuitive tool to represent and store multi-resolution images [18]. It can depict the contextual and tex-tural information of an image at different levels, which mo-tivates us to introduce WT to a CNN- based super-resolutionsystem. As illustrated in Figure 1, the approximation coef-ficients( the top-left patches in (b-d)) of different-levelwavelet packet decomposition [4] compress the face s glob-al topology information at different levels; the detail coeffi-cients( the rest patches in (b-d)) reveal the face s struc-ture and texture information.

We assume that a high-qualityHR image with abundant textural details and global topolo-gy information can be reconstructed via a LR image as longas the corresponding wavelet coefficients are accurately pre-dicted. Hence, the task of inferring a high-resolution face istransformed to predicting a series of wavelet on the prediction of high-frequency wavelet co-efficients helps recovering texture details, while constraintson the reconstruction of low-frequency wavelet coefficientsenforces consistence on global topology information. Thecombination of the two aspects makes the final HR resultsmore take full advantage of wavelet transform, we present awavelet- based convolutional neural network for face super-resolution, which consists of three subnetworks: embed-ding, wavelet prediction and reconstruction networks.

Theembedding net takes the low-resolution face as an input andrepresents it as a set of feature maps. The wavelet predic-tion net is a series of parallel individual subnetworks, eachof which aims to learn a certain wavelet coefficient usingthe embedded features. The number of these subnetwork-s is flexible and easy to adjust on demand, which makesthe magnification factor flexible as well. The reconstruc-tion network is used to recover the inferred wavelet coef-ficients to the expected HR image, acting as a learned ma-trix. These three subnetworks are coordinated with threetypes of loss: wavelet prediction loss, texture loss and full-image loss. The wavelet prediction loss and texture losscorrespond with the wavelet prediction subnetwork, impos-ing constraint in wavelet domain.

The full-image loss isused after the reconstruction subnetwork to add a tradition-al MSE constraint in image space. Besides, as each waveletcoefficient shares the same size with the low-resolution in-put, we use a network configuration to make every featuremap keep the same size with the input, which reduces thedifficulty of training. As our network is fully convolutionaland trained with simply-aligned faces, it can apply to dif-ferent input resolutions with various magnifications, regard-less of pose and occlusion variations. Experimental resultscollaborate with our assumption and demonstrate that ourmethod can well capture both global topology informationand local textural details of human contributions of our work can be summarized asfollows:1) A novel Wavelet-Based approach is proposed forCNN- based face SR problem.

To the best of our knowl-edge, this is the first attempt to transform single image S-R to wavelet coefficients prediction task in deep learningframework - albeit many Wavelet-Based researches exist ) A flexible and extensible fully convolutional neuralnetwork is presented to make the best use of wavelet trans-form. It can apply to different input-resolution faces withmultiple upscaling ) We qualitatively and quantitatively explore Multi-Scale face super-resolution, especially on very low inputresolutions. Experimental results show that our proposedapproach outperforms state-of-the-art face SR Related workIn general, image super-resolution methods can bedivided into three types: interpolation- based , statistics- based [26, 31, 32] and learning- based methods [3, 9, 24].

In the early years, the former two types have attracted mostof attention for their computationally efficiency. However,they are always limited to small upscaling factors. Learn-ing based methods employ large quantities of LR/HR imagepair data to infer missing high-frequency information andpromises to break the limitations of big magnification. Re-cently deep learning based methods [6, 14, 15, 2, 23] havebeen introduced into SR problem due to their powerful abil-ity to learn knowledge from large database. Most of theseconvolutional methods use MSE loss to learn the map func-tion of LR/HR image pairs, which leads to over-smooth out-puts when the input resolution is very low and the magnifi-cation is to face super-resolution, there have been aboutthree ways to alleviate this problem.

Wavelet-SRNet: A Wavelet-Based CNN for Multi-Scale Face ...

Tags:

Information

Advertisement

Transcription of Wavelet-SRNet: A Wavelet-Based CNN for Multi-Scale Face ...

Related search queries

Wavelet-SRNet: A Wavelet-Based CNN for Multi-Scale Face ...

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries