Predictive and Generative Neural Networks for Object Functionality

ACM Transactions on Graphics (Proceedings of SIGGRAPH 2018)

Ruizhen Hu1          Zihao Yan1          Jingwen Zhang1          Oliver van Kaick2          Ariel Shamir3          Hao Zhang4          Hui Huang1*

1Shenzhen University          2Carleton University          3The interdisciplinary Center          4Simon Fraser University

Fig. 1. Given an object in isolation (left of each example), our generative network synthesizes scenes that demonstrate the functionality of the object in terms of interactions with surrounding objects (middle). Note the different types of functionalities appearing in the scenes generated by the network, involving interactions such as support, containment, and grasping. The scene is refined by replacing voxelized objects with higher resolution models (right).

Humans can predict the functionality of an object even without any surroundings, since their knowledge and experience would allow them to "hallucinate" the interaction or usage scenarios involving the object. We develop predictive and generative deep convolutional neural networks to replicate this feat. Specifically, our work focuses on functionalities of man-made 3D objects characterized by human-object or object-object interactions. Our networks are trained on a database of scene contexts, called interaction contexts, each consisting of a central object and one or more surrounding objects, that represent object functionalities. Given a 3D object in isolation, our functional similarity network (fSIM-NET), a variation of the triplet network, is trained to predict the functionality of the object by inferring functionality-revealing interaction contexts involving the object. fSIM-NET is complemented by a generative network (iGEN-NET) and a segmentation network (iSEG-NET). iGEN-NET takes a single voxelized 3D object and synthesizes a voxelized surround, i.e., the interaction context which visually demonstrates the object’s functionalities. iSEG-NET separates the interacting objects into different groups according to their interaction types.

Fig. 2. Our complete pipeline for understanding object functionality: Given an object in isolation (a), we first transform it into a voxel representation (b). Then, we retrieve scenes with functionality most similar to the object (c), using our functional similarity network. The scenes provide richer functionality information in the form of interactions of central objects with surrounding objects. Based on the label of the retrieved scene, we synthesize an interaction context (d) for the given object using our generative network. We partition the interaction context into individual objects (e) using our segmentation network, to enable further processing and analysis of the scene, such as replacing voxels with higher-resolution models (f).

Fig. 3. The architecture of our functional similarity network – fSIM-NET. The layers shown on the top row implement the Eobj subnetwork, while the layers on the second and third rows implement the Escn subnetworks. We show over each volume the number of units of the same type that appear in the layer, while the dimensions of the data processed by each layer are written under the volume.

Fig. 5. The architecture of our interaction context generation network – iGEN-NET. Given an input object x (top-left) and functionality label c (bottom-left), the network generates an output scene X and places x into this scene (right), based on transformation parameters s and t .

Fig. 6. The architecture of our segmentation network – iSEG-NET. Given an input scene X (top-left) and functionality label c (bottom-left), the network segments the interacting objects in the scene into different groups. The central object, extracted from the input encoding, can then be recombined with the output segmented scene (right).


We thank the anonymous reviewers for their valuable comments. This work was supported in part by NSFC (61602311, 61522213, 61761146002, 6171101466), 973 Program (2015CB352501), Guangdong Science and Technology Program (2015A030312015), Shenzhen Innovation Program (JCYJ20170302153208613, JCYJ20151015151249564) and NSERC Canada (611370, 611649, 2015-05407).

Source Code & Data

Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.



title = {Predictive and Generative Neural Networks for Object Functionality},
author = {Ruizhen Hu and Zihao Yan and Jingwen Zhan and Oliver van Kaick and Ariel Shamir and Hao Zhang and Hui Huang},
journal = {ACM Transactions on Graphics (Proc. SIGGRAPH)},
volume = {37},
number = {4},
pages = {151:1--151:14},  
year = {2018},

Downloads (faster for people in China)

Downloads (faster for people in other places)