Semantic Object Reconstruction via Casual Handheld Scanning

ACM Transactions on Graphics (Proceedings of SIGGRAPH ASIA 2018)

Ruizhen Hu1         Cheng Wen1         Oliver Van Kaick2         Luanmin Chen1          Di Lin1       Daniel Cohen-or1,3         Hui Huang1∗

1Shenzhen University     2Carleton University   3Tel Aviv University

Fig. 1. A semantic reconstruction of an object obtained with our method (top), using a semantic labeling of frames (one example shown in the bottom-right) computed for RGB and depth input images (bottom-left and middle).


We introduce a learning-based method to reconstruct objects acquired in a casual handheld scanning setting with a depth camera. Our method is based on two core components. First, a deep network that provides a semantic segmentation and labeling of the frames of an input RGBD sequence. Second, an alignment and reconstruction method that employs the semantic labeling to reconstruct the acquired object from the frames. We demonstrate that the use of a semantic labeling improves the reconstructions of the objects, when compared to methods that use only the depth information of the frames.

Moreover, since training a deep network requires a large amount of labeled data, a key contribution of our work is an active self-learning framework to simplify the creation of the training data. Specifically, we iteratively predict the labeling of frames with the neural network, reconstruct the object from the labeled frames, and evaluate the confidence of the labeling, to incrementally train the neural network while requiring only a small amount of user-provided annotations. We show that this method enables the creation of data for training a neural network with high accuracy, while requiring only little manual effort.

Fig. 2. Overview of our active self-learning method for object reconstruction. We learn how to segment and label a sequence of RGBD frames (a), to improve the quality of object reconstruction (e). Specifically, we employ an active self-learning approach to create the necessary data for the learning while involving minimal user effort. The active learning asks for user input on strategically-selected frames (green arrow) and then invokes a self-learning component on the annotated frames. The self-learning is an automatic learning approach consisting of cycles of prediction, reconstruction, and confidence estimation for creating additional training data from the remaining frames in the sequence (black + blue arrows). Please refer to Section 3 for details on these steps.

Fig. 7. Selected segmentations and labelings of frames obtained with our deep network. Each example shows the RGB and depth inputs, and the prediction. Note the semantic correctness and low noise level of the results.

Fig. 9. Comparison between the label prediction provided by the neural network and the labels obtained after fusion and back-projection (in the red boxes). Note the improvement in the quality of segments ater fusion.

Fig. 11. Reconstruction results obtained with the method of Nießner et al. [2013], which does not consider semantic information (left of each example), compared to the results of our method that incorporates semantic information (right, in the red boxes). Note how our reconstructions are smoother, and have less missing data and less misalignments.

Data & Code

Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.

We thank the anonymous reviewers for their valuable comments. This work was supported in parts by NSFC (61602311, 61522213, 61761146002, 61702338), 973 Program (2015CB352501), GD Science and Technology Program (2015A030312015), Shenzhen Innovation Program (JCYJ20170302153208613, KQJSCX20170727101233642), ISFNSFC Joint Research (2472/17), and NSERC Canada (2015-05407).


title = {Semantic Object Reconstruction via Casual Handheld Scanning},
author = {Ruizhen Hu and Cheng Wen and Oliver Van Kaick and Luanmin Chen and Di Lin and Daniel Cohen-Or and Hui Huang},
journal = {ACM Transactions on Graphics (Proc. SIGGRAPH ASIA)},
volume = {37},
number = {6},
pages = {219:1--219:12},  
year = {2018},

Downloads (faster for people in China)

Downloads (faster for people in other places)