EFANet: Exchangeable Feature Alignment Network for Arbitrary Style Transfer

Thirty-Fourth AAAI Conference on Artificial Intelligence (Proceedings of AAAI 2020)

Zhijie Wu1    Chunjin Song   Yang Zhou1*    Minglun Gong2    Hui Huang1*

1Shenzhen University    2University of Guelph


Style transfer has been an important topic both in computer vision and graphics. Since the seminal work of Gatys et al. first demonstrates the power of stylization through optimization in the deep feature space, quite a few approaches have achieved real-time arbitrary style transfer with straightforward statistic matching techniques. In this work, our key observation is that only considering features in the input style image for the global deep feature statistic matching or local patch swap may not always ensure a satisfactory style transfer; see e.g., Figure 1. Instead, we propose a novel transfer framework, EFANet, that aims to jointly analyze and better align exchangeable features extracted from the content and style image pair. In this way, the style feature from the style image seeks for the best compatibility with the content information in the content image, leading to more structured stylization results. In addition, a new whitening loss is developed for purifying the computed content features and better fusion with styles in feature space. Qualitative and quantitative experiments demonstrate the advantages of our approach.

Figure 3: (a) Architecture overview. The input image pair Ic and Is, goes through the pre-trained VGG encoder to extract feature maps {fci} and {fsi} ; i ∈ {1,..., L}, L = 4. Then, starting from fcL and fsL , different EFANet modules are applied to progressively fuse styles into corresponding decoded features for final stylized images. (b) The architecture of EFANet module. Given ^fcs and fs as inputs, we compute two Gram matrices as the raw styles and then represent them as two lists of feature vectors {jcs} and {js}. Each corresponding style vector pair (jcs and j) is fed into the newly proposed Feature Exchange Block and a common feature vector jcom is extracted via the joint analysis. We concatenate jcom with jcs and jrespectively to learn two exchangeable style feature ~jcs and ~j. ~jcs is used for the content feature purification, which will be further fused with ~j , outputting fcs . Finally fcs will be either propagated for finer-scale information or decoded into stylized images Isc.

Figure 5: Comparison with results from different methods. Note that the proposed model generates images with better visual quality while the results of other baselines have various artifacts; see text for detailed discussions.

Figure 6: Balance between content and style. At testing stage, the degree of stylization can be controlled using parameter α.

Figure 7: Application for spatial control. Left: content image. Middle: style images with masks to indicate target regions. Right: synthesized result.

Figure 8: Ablation study on multi-scale strategy. By fusing the content and style in multi-scales, we can enrich the local and global style patterns for stylized images.

Figure 9: Ablation study on whitening loss. With the proposed loss, clearer content contours and better style pattern consistency are achieved.


This work was supported in parts by NSFC (61861130365, 61761146002), GD Higher Education Innovation Key Program (2018KZDXM058), GD Science and Technology Program (2015A030312015), Shenzhen Innovation Program (KQJSCX20170727101233642), LHTD (20170003), and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University.


title = {
EFANet: Exchangeable Feature Alignment Network for Arbitrary Style Transfer},
author = {Zhijie Wu and Chunjin Song and Yang Zhou and Minglun Gong and Hui Huang},
journal = {Proceedings of AAAI, Spotlight},
pages = {12305--12312},  
year = {2020},

Downloads (faster for people in China)

Downloads (faster for people in other places)