ShapeFormer: Transformer-based Shape Completion via Sparse Representation

Conference on Computer Vision and Pattern Recognition (Proceedings of CVPR 2022)

Xingguang Yan1     Liqiang Lin1     Niloy J. Mitra2,3     Dani Lischinski4     Daniel Cohen-Or1,5     Hui Huang1*

1Shenzhen University     2University College London     3Adobe Research     4Hebrew University of Jerusalem     5Tel Aviv University


We present ShapeFormer, a transformer-based network that produces a distribution of object completions, conditioned on incomplete, and possibly noisy, point clouds. The resultant distribution can then be sampled to generate likely completions, each exhibiting plausible shape details while being faithful to the input. To facilitate the use of transformers for 3D, we introduce a compact 3D representation, vector quantized deep implicit function (VQDIF), that utilizes spatial sparsity to represent a close approximation of a 3D shape by a short sequence of discrete variables. Experiments demonstrate that ShapeFormer outperforms prior art for shape completion from ambiguous partial inputs in terms of both completion quality and diversity. We also show that our approach effectively handles a variety of shape types, incomplete patterns, and real-world scans.

Figure 2. Overview of our shape completion approach. Given a partial point cloud P, possibly from a depth image, as input, our VQDIF encoder first converts it to a sparse feature sequence z0...K_1, replacing them with the indices of their nearest neighbor ej in a learned dictionary D, forming a sequence of discrete 2-tuples consisting of the coordinate (pink) and the quantized feature index (blue). We refer to this partial sequence as SP (drawn with dashed lines). The ShapeFormer then takes SP as input and models the conditional distribution p(SC|SP). Autoregressive sampling yields a probable complete sequence SC. Finally, the VQDIF decoder converts the sequence SC to a deep implicit function, from which the surface reconstruction. M can be extracted. To show the faithfulness of our reconstructions, we super-impose the input point cloud on them. Please see the supplementary material for more architectural details.

Figure 4. Visual comparison with prior shape completion methods on the ShapeNet dataset. Our method can better handle ambiguous scans and produce completions that are more faithful on both observed and unseen regions. More examples are in the supplementary material.

Figure 5. Visual comparison for multi-modal shape completion of Table, Chair, and Lamp categories on PartNet. We can produce diverse completions that better align with the input.

Data & Code

Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication. 



We thank the reviews for their comments. We thank Ziyu Wan, Xuelin Chen and Jiahui Lyu for discussions. This work was supported in parts by NSFC (62161146005, U21B2023, U2001206), GD Talent Program (2019JC05X328), GD Science and Technology Program (2020A0505100064), DEGP Key Project (2018KZDXM058, 2020SFKC059), Shenzhen Science and Technology Program (RCJC20200714114435012, JCYJ20210324120213036), Royal Society (NAF-R1-180099), ISF (3441/21, 2492/20) and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ).



title={ShapeFormer: Transformer-based Shape Completion via Sparse Representation},

author={Xingguang Yan and Liqiang Lin and Niloy J. Mitra and Dani Lischinski and Daniel Cohen-Or and Hui Huang},




Downloads (faster for people in China)

Downloads (faster for people in other places)