SCN: Switchable Context Network for Semantic Segmentation of RGB-D Images

 IEEE Transactions on Cybernetics 2019

Di Lin        Ruimao Zhang2         Yuanfeng Ji1             Ping Li             Hui Huang1*

1Shenzhen University         2Chinese University of Hong Kong         3MacauUniversity




Fig. 1. Correlation observed between the depth and object co-existence: in common the near regions, e.g., highlighted in blue rectangles, have relatively simple object co-existences, while very likely different objects coexist densely in the far regions, such as highlighted in red rectangles.

Fig. 2. The overview of our switchable context network (SCN). Given a RGB image, we produce the convolutional feature maps layer-by-layer in a resolution-descending order. Our SCN firstly produces the local structural feature maps, which are used to compute the context representations in topdown switchable information propagation. The context representations are combined with the convolutional features to form the intermediate feature maps, which are used for the final semantic segmentation.


Abstract

Context representations have been widely used to profit semantic image segmentation. The emergence of depth data provides additional information to construct more discriminating context representations. Depth data preserves the geometric relationship of objects in a scene, which is generally hard to be inferred from RGB images. While deep convolutional neural networks (CNN) have been successful in solving semantic segmentation, we encounter the problem of optimizing CNN training for the informative context using depth data to enhance the segmentation accuracy.
In this paper, we present a novel switchable context network (SCN) to facilitate semantic segmentation of RGB-D images. Depth data is used to identify objects existing in multiple image regions. The network analyzes the information in the image regions to identify different characteristics, which are then used selectively through switching network branches. With the content extracted from the inherent image structure, we are able to generate effective context representations that are aware of both image structures and object relationships, leading to a more coherent learning of semantic segmentation network. We demonstrate that our SCN outperforms state-of-the-art methods on two public datasets.


Fig. 3. The construction of our contextual representation undergoes two information propagations: (a) local structural information propagation. In this stage, each region (a color node of the regular grid in the intermediate feature maps) receives the information from the regions located in the same super-pixel. The regions (the enlarged node of the regular grid) having richer information constitute the local structural feature maps; (b) top-down switchable information propagations. We compute the average depth value for each super-pixel. In the last column, the super-pixels highlighted in red and blue contain the regions that provide the information output by compression and expansion architectures, respectively. Each region (a color node of the regular grid in the local structural feature maps) receives the information from the regions located in the adjacent super-pixels, and form the region (the highlighted red node in the context representations) having accurate contextual information. For illustration, the context representations are shown in the same size with the local structural feature maps. Actually, the context representations have larger resolution than the local structural feature maps do.


Fig. 5. A sample of the comparison to the state-of-the-art model [35] and our SCN. The images are selected from NYUDv2 [38] dataset.



Fig. 6. A sample of the comparison to the state-of-the-art model [35] and our SCN. The images are selected from SUN-RGBD [21] dataset.


Acknowledgement
We thank the reviewers and editors for their valuable comments. This work was supported in parts by NSFC (61702338, 61522213, 61761146002, 61861130365), National 973 Program (2015CB352501), Guangdong Science and Technology Program (2015A030312015), Shenzhen Innovation Program (KQJSCX20170727101233642) and the Macau Science and Technology Development Fund under Grant (0027/2018/A1).


Bibtex
@article{SCN19,
title = {SCN: Switchable Context Network for Semantic Segmentation of RGB-D Images},
author = {Di Lin and Ruimao Zhang and Yuanfeng Ji and Ping Li and Hui Huang},
journal = {IEEE Transactions on Cybernetics 2019},
volume = {},
number = {},
pages = {},  
year = {},

Downloads(faster for people in China)

Downloads(faster for people in other places)