VirtualCity3D: A Large-Scale Urban Scene Dataset and Simulator

Yilin Liu    Hui Huang*

Shenzhen University


The ability to perceive the environment in different ways is essential to robotic research. This involves the analysis of various kinds of data sources (e.g., depth map, visual image, LIDAR data). Related works in 2D/2.5D [1, 2] image domains have been proposed, however, a comprehensive understanding of the 3D environment needs the cooperation of 3D data (point cloud, textured polygon mesh), which is still absent in the community. We present a large scale urban dataset and simulator based on Unreal Engine 4 [3] and Airsim [4], which consists of six city scenes in different scales, referred to as VirtualCity3D. The provided 3D ground truth textured models in VirtualCity3D allow users to get all kinds of data they want: instance segmentation map, depth map in arbitrary resolution, 3D point cloud/mesh in both visible and invisible place, etc. With the help of Airsim [4], users can also simulate the robot (car/drone) to test their algorithms in the proposed city environment.


Table 1: Statistics of our 3D urban dataset. Note that in addition to buildings, there may also exist many other objects we need consider, such as trees, flower beds, and street lights, which highly increase the challenge for height mapping and autonomous navigation tasks.

Table 2: Statistics of the data sources of different datasets.


The released zip file contains the unreal project of the above proposed city scenes. Users can use either pure Unreal Engine or Airsim client (both in C++ or python) to capture their desired data. The ground truth textured meshes and their relevant pose are also provided in the Unreal project.

Required packages: 

- Unreal Engine 4 (4.24 is recommended) 

- (Optional) Airsim



Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.

Download via FTP    Download via HTTP (10G, have to wait for a while:-(


This work was supported in parts by NSFC (U2001206), GD Outstanding Talent Program (2019JC05X328), GD Science and Technology Program (2020A0505100064, 2015A030312015), Shenzhen Science and Technology Program (RCJC20200714114435012), and Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen University).



title={VirtualCity3D: A Large Scale Urban Scene Dataset and Simulator},

author={Yilin Liu and Hui Huang},




title={VGF-Net: Visual-Geometric Fusion Learning for Simultaneous Drone Navigation and Height Mapping},

author={Yilin Liu and Ke Xie and Hui Huang},

journal={Graphical Models},




[1] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. Euro. Conf. on Computer Vision, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., 2014, pp. 740–755.

[2] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in Proc. IEEE Conf. on Computer Vision & Pattern Recognition, 2012, pp. 3354–3361.

[3] Epic Games, “Unreal Engine.” [Online]. Available:

[4] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “AirSim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and Service Robotics, 2017.

[5] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conf. on Computer Vision & Pattern Recognition, 2016, pp. 3213–3223.

[6] X. Song, P. Wang, D. Zhou, R. Zhu, C. Guan, Y. Dai, H. Su, H. Li, and R. Yang, “ApolloCar3D: A large 3D car instance understanding benchmark for autonomous driving,” in Proc. IEEE Conf. on Computer Vision & Pattern Recognition, 2019, pp. 5447–5457.

Downloads(faster for people in China)

Downloads(faster for people in other places)