Multi-Scale Context Intertwining for Semantic Segmentation

European Conference on Computer Vision

Di Lin1       Yuanfeng Ji1       Dani Lischinski     Daniel Cohen-Or1,3     Hui Huang1*

1Shenzhen University      2The Hebrew University of Jerusalem      3Tel Aviv University

Fig. 1: Alternative approaches for encoding multi-scale context information into segmentation features for per-pixel prediction. The spatial pyramid pooling (SPP) network (a) and the encoder-decoder (ED) network (b) propagate information across the hierarchy in a one-directional fashion. In contrast, our multi-scale context intertwining architecture (c) exchanges information between adjacent scales in a bidirectional fashion, and hierarchically combines the resulting feature maps. Figure 2 provides a more detailed illustration of the multi-stage recurrent context intertwining process.


Accurate semantic image segmentation requires the joint consideration of local appearance, semantic information, and global scene context. In today's age of pre-trained deep networks and their powerful convolutional features, state-of-the-art semantic segmentation approaches differ mostly in how they choose to combine together these dfferent kinds of information. In this work, we propose a novel scheme for aggregating features from di erent scales, which we refer to as Multi-Scale Context Intertwining (MSCI). In contrast to previous approaches, which typically propagate information between scales in a one-directional manner, we merge pairs of feature maps in a bidirectional and recurrent fashion, via connections between two LSTM chains. By training the parameters of the LSTM units on the segmentation task, the above approach learns how to extract powerful and e ective features for pixel-level semantic segmentation, which are then combined hierarchically. Furthermore, rather than using xed information propagation routes, we subdivide images into super-pixels, and use the spatial relationship between them in order to perform image-adapted context aggregation. Our extensive evaluation on public benchmarks indicates that all of the aforementioned components of our approach increase the e ectiveness of information propagation throughout the network, and signi cantly improve its eventual segmentation accuracy.

Fig. 2: Multi-scale context intertwining between two successive feature maps in the deep hierarchy. The green arrows propagate the context information from the lower-resolution feature map to the higher-resolution one. Conversely, the blue arrows forward information from the higher-resolution feature map to augment the lower-resolution one. The orange circle in each stage indicates the hidden features output by LSTMs, including the cell states and gates.

Fig. 3: Bidirectional context aggregation. The features are partitioned into different regions de ned by super-pixels. We aggregate the neurons resided in the same region, and pass the information of the adjacent regions along the bidirectional connection (a) from a low-resolution feature to a high-resolution feature; and (b) from a high-resolution feature to a low-resolution feature.

Fig. 4: The segmentation results of the ASPP model [20], Encoder-Decoder with ASPP model [25] and our MSCI. The images are taken from the PASCAL VOC 2012 validation set.

Fig. 5: The segmentation results of the ASPP model [20], Encoder-Decoder with ASPP model [25] and our MSCI. The images are scenes taken from the PASCAL-Context validation set.

Fig. 6: MSCI segmentation results. The images are taken from the NYUDv2 validation set (left) and the SUN-RGBD validation set (right).

We thank the anonymous reviewers for their constructive comments. This work was supported in part by NSFC (61702338, 61522213, 61761146002, 61861130365), 973 Program (2015CB352501), Guangdong Science and Technology Program (2015A030312015), Shenzhen Innovation Program (KQJSCX20170727101233642, JCYJ20151015151249564), and ISF-NSFC Joint Research Program (2472/17).

  title={Multi-Scale Context Intertwining for Semantic Segmentation},
  author={Di Lin and Yuanfeng Ji and Dani Lischinski and Daniel Cohen-Or and Hui Huang},




Downloads (faster for people in China)

Downloads (faster for people in other places)