ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation

Conference on Computer Vision and Pattern Recognition (Proceedings of CVPR 2019)

Di Lin1         Dingguo Shen1         Siting Shen1         Yuanfeng Ji1         Dani Lischinski2        Daniel Cohen-Or1,3         Hui Huang1*

1Shenzhen University               2Hebrew University of Jerusalem             3Tel Aviv University

Figure 1. Different approaches for propagating multi-scale context information. The top-down networks (a) and (b) use the deeper layers to augment the shallower ones. The feature maps produced by the top-down network are further passed to the bottom-up network (c). In contrast to (a)-(c), ZigZagNet (d) exchanges feature maps between the top-down and bottom-up networks to achieve a richer encoding of multi-scale context. The orange blocks on the left represent the feature maps of backbone networks. The blue and green blocks represent the feature maps produced at different stages. For conceptual illustration, we omit some overlapping pathways and only show a subset of the dense pathways between feature maps in (d). Figure 2 depicts the ZigZagNet architecture in more detail.

Multi-scale context information has proven to be essential for object segmentation tasks. Recent works construct the multi-scale context by aggregating convolutional feature maps extracted by different levels of a deep neural network. This is typically done by propagating and fusing features in a one-directional, top-down and bottom-up, manner. In this work, we introduce ZigZagNet, which aggregates a richer multi-context feature map by using not only dense top-down and bottom-up propagation, but also by introducing pathways crossing between different levels of the top-down and the bottom-up hierarchies, in a zig-zag fashion. Furthermore,
the context information is exchanged and aggregated over multiple stages, where the fused feature maps from one stage are fed into the next one, yielding a more comprehensive context for improved segmentation performance.Our extensive evaluation on the public benchmarks demonstrates that ZigZagNet surpasses the state-of-the-art accuracy for both semantic segmentation and instance segmentation tasks.

Figure 2. Top-down and bottom-up context propagation in ZigZagNet. The gray arrows of the top-down network (a) and bottom-up network (b) represent the dense pathways between different levels of feature maps. The red arrows iteratively exchange the context information between the top-down and bottom-up networks, which generate all levels of feature maps over multiple iterations. Here, we show only two different levels of feature maps to simplify the illustration. The blue and green blocks represent feature maps computed in two successive iterations.

Figure 4. Six semantic segmentation results produced by our method. The first three rows are from the PASCAL VOC 2012 validation set, and the last three rows are from the PASCAL Context validation set.
Figure 5. Several instance segmentation results produced by our method. The images are taken from the COCO validation set.

Data & Code

Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.


We thank the anonymous reviewers for their constructive comments. This work was supported in parts by 973 Program (2015CB352501), NSFC (61702338, 61761146002, 61861130365), Guangdong Science and Technology Program (2015A030312015), Shenzhen Innovation Program (KQJSCX20170727101233642), LHTD (20170003), ISFNSFC Joint Program (2472/17), and National Engineering Laboratory for Big Data System Computing Technology.

title = {ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation},
author = {Di Lin and Dingguo Shen and Siting Shen and Yuanfeng Ji and Dani Lischinskiand Daniel Cohen-Or and Hui Huang},
journal = {Conference on Computer Vision and Pattern Recognition (Proceedings of CVPR 2019)},

pages = {7490--7499}, 

year = {2019},

Downloads(faster for people in China)

Downloads(faster for people in other places)