Visual Computing Research Center
Shenzhen University, China
We present the UrbanBIS benchmark for large-scale 3D urban understanding, supporting practical urban-level semantic and building-level instance segmentation. UrbanBIS comprises six real urban scenes, with 2.5 billion points, covering a vast area of 10.78 square kilometers and 3,370 buildings, captured by 113,346 views of aerial photogrammetry. Particularly, UrbanBIS provides not only semantic-level annotations on a rich set of urban objects, including buildings, vehicles, vegetation, roads, and bridges, but also instance-level annotations on the buildings. Further, UrbanBIS is the first 3D dataset that introduces fine-grained building sub-categories, considering a wide variety of shapes for different building types. Besides, we propose B-Seg, a building instance segmentation method to establish UrbanBIS. B-Seg adopts an end-to-end framework with a simple yet effective strategy for handling large-scale point clouds. Compared with mainstream methods, B-Seg achieves better accuracy with faster inference speed on UrbanBIS. In addition to the carefully-annotated point clouds, UrbanBIS provides high-resolution aerial-acquisition photos and high-quality large-scale 3D reconstruction models, which shall facilitate a wide range of studies such as multi-view stereo, urban LOD generation, aerial path planning, autonomous navigation, road network extraction, and so on, thus serving as an important platform for many intelligent city applications.
Download
arXiv paper benchmark DATASET B-SEG DATASET code supp
UrbanBIS is publicly accessible for non-commercial uses only. Permission is granted to use the data only if you agree:
- The dataset is provided "AS IS". Despite our best efforts to assure accuracy, we disclaim all liability for any mistakes or omissions;
- All works that utilize this dataset including any partial use must cite our paper provided below;
- You refrain from disseminating this dataset or any altered variations;
- You are not permitted to utilize this dataset or any derivative work for any commercial endeavors;
- We reserve all rights that are not explicitly granted to you.
We place great emphasis on ensuring the privacy and confidentiality of all data involved. Our practices align with the highest standards set by relevant laws and regulations. We have implemented robust measures to mitigate privacy concerns effectively. In the rare instance that you identify any privacy issues pertaining to your information within our dataset, please reach out to us promptly. We assure you that we will immediately remove the affected data upon receiving your request, prioritizing your privacy and confidentiality.
Each line of the txt file represents a point contains the following information:
- X Y Z R G B Semantic_label Instance_label Fine-grained_building_category;
- Semantic_label = {'Terrain': 0, 'Vegetation': 1, 'Water': 2, 'Bridge': 3, 'Vehicle': 4, 'Boat': 5, 'Building': 6};
- Fine-grained_building_category = {'Commercial': 0, 'Residential': 1, 'Office': 2, 'Cultural': 3, 'Transportation': 4, 'Municipal': 5, 'Temporary': 6, 'Unclassified': 7}.

Choose the way of data downloading (Dropbox by default)

Click images
for a larger view

Go to
BENCHMARK
Category Qingdao Wuhu Longhua Yuehai Lihu Yingrenshi
Labeled
Point Cloud
26.5 GB 27.8 GB 29.1 GB 17.5 GB 11.5 GB 0.92 GB
Scene
Point Cloud
Semantic
Segmentation
Fine-grained
Building Category
Building Instance
Segmentation
Total# 594.06 M 625.08 M 653.90 M 393.37 M 255.12 M 22.22 M
Buliding 269.59 M 285.28 M 256.39 M 117.98 M 65.18 M 14.97 M
Ground 114.22M 133.32 M 158.62 M 69.60 M 80.54 M 4.39 M
Water 11.46 M 20.95 M 0.26 M 3.86 M 2.46 M 0
Boat 4.20M 409 852 0 2,490 0
Vegetation 179.50 M 175.69 M 225.50 M 197.83 M 104.09 M 1.66 M
Vehicle 15.05 M 8.24 M 11.35 M 1.16 M 2.08 M 0.85 M
Bridge 37,074 1.61 M 1.77 M 2.93 M 0.78 M 0.35 M
Images
Textured Meshes
Qingdao Wuhu Longhua Yuehai Lihu Yingrenshi

Scene Point Cloud

Semantic Segmentation

Fine-grained Building Category

Building Instance Segmentation

Total# 594.06 M
Buliding 269.59 M
Ground 114.22M
Water 11.46 M
Boat 4.20M
Vegetation 179.50 M
Vehicle 15.05 M
Bridge 37,074
Images
Textured Meshes

Scene Point Cloud

Semantic Segmentation

Fine-grained Building Category

Building Instance Segmentation

Total# 625.08 M
Buliding 285.28 M
Ground 133.32 M
Water 20.95 M
Boat 409
Vegetation 175.69 M
Vehicle 8.24 M
Bridge 1.61 M
Images
Textured Meshes

Scene Point Cloud

Semantic Segmentation

Fine-grained Building Category

Building Instance Segmentation

Total# 653.90 M
Buliding 256.39 M
Ground 158.62 M
Water 0.26 M
Boat 852
Vegetation 225.50 M
Vehicle 11.35 M
Bridge 1.77 M
Images
Textured Meshes

Scene Point Cloud

Semantic Segmentation

Fine-grained Building Category

Building Instance Segmentation

Total# 393.37 M
Buliding 117.98 M
Ground 69.60 M
Water 3.86 M
Boat 0
Vegetation 197.83 M
Vehicle 1.16 M
Bridge 2.93 M
Images
Textured Meshes

Scene Point Cloud

Semantic Segmentation

Fine-grained Building Category

Building Instance Segmentation

Total# 255.12 M
Buliding 65.18 M
Ground 80.54 M
Water 2.46 M
Boat 2,490
Vegetation 104.09 M
Vehicle 2.08 M
Bridge 0.78 M
Images
Textured Meshes

Scene Point Cloud

Semantic Segmentation

Fine-grained Building Category

Building Instance Segmentation

Total# 22.22 M
Buliding 14.97 M
Ground 4.39 M
Water 0
Boat 0
Vegetation 1.66 M
Vehicle 0.85 M
Bridge 0.35 M
Images
Textured Meshes
B-SEG
overview
results
Method Qingdao Wuhu Longhua Campus
↑AP ↑AP50 ↑AP25 ↓T(s) ↑AP ↑AP50 ↑AP25 ↓T(s) ↑AP ↑AP50 ↑AP25 ↓T(s) ↑AP ↑AP50 ↑AP25 ↓T(s)
PointGroup [1] 0.364 0.512 0.578 9.80 0.502 0.662 0.748 5.90 0.318 0.443 0.556 5.73 0.117 0.235 0.455 3.65
HAIS [2] 0.320 0.465 0.506 7.11 0.383 0.616 0.711 3.62 0.159 0.249 0.350 3.17 0.002 0.012 0.146 3.26
SoftGroup [3] 0.383 0.446 0.487 6.55 0.536 0.649 0.721 3.61 0.151 0.199 0.300 3.06 0.253 0.364 0.439 2.16
DyCo3D [4] 0.285 0.376 0.498 5.20 0.470 0.620 0.732 3.04 0.020 0.045 0.196 1.77 0.029 0.063 0.180 1.67
DKNet [5] 0.383 0.434 0.474 2.15 0.474 0.575 0.650 1.20 0.077 0.154 0.253 1.78 0.044 0.109 0.251 0.88
B-Seg (Ours) 0.453 0.550 0.672 1.19 0.549 0.674 0.767 0.99 0.402 0.513 0.618 1.16 0.261 0.386 0.535 0.74
Qingdao Wuhu Longhua Campus
Method ↑AP ↑AP50 ↑AP25 ↓T(s)
PointGroup [1] 0.364 0.512 0.578 9.80
HAIS [2] 0.320 0.465 0.506 7.11
SoftGroup [3] 0.383 0.446 0.487 6.55
DyCo3D [4] 0.285 0.376 0.498 5.20
DKNet [5] 0.383 0.434 0.474 2.15
B-Seg (Ours) 0.453 0.550 0.672 1.19
Method ↑AP ↑AP50 ↑AP25 ↓T(s)
PointGroup [1] 0.502 0.662 0.748 5.90
HAIS [2] 0.383 0.616 0.711 3.62
SoftGroup [3] 0.536 0.649 0.721 3.61
DyCo3D [4] 0.470 0.620 0.732 3.04
DKNet [5] 0.474 0.575 0.650 1.20
B-Seg (Ours) 0.549 0.674 0.767 0.99
Method ↑AP ↑AP50 ↑AP25 ↓T(s)
PointGroup [1] 0.318 0.443 0.556 5.73
HAIS [2] 0.159 0.249 0.350 3.17
SoftGroup [3] 0.151 0.199 0.300 3.06
DyCo3D [4] 0.020 0.045 0.196 1.77
DKNet [5] 0.077 0.154 0.253 1.78
B-Seg (Ours) 0.402 0.513 0.618 1.16
Method ↑AP ↑AP50 ↑AP25 ↓T(s)
PointGroup [1] 0.117 0.235 0.455 3.65
HAIS [2] 0.002 0.012 0.146 3.26
SoftGroup [3] 0.253 0.364 0.439 2.16
DyCo3D [4] 0.029 0.063 0.180 1.67
DKNet [5] 0.044 0.109 0.251 0.88
B-Seg (Ours) 0.261 0.386 0.535 0.74
dataset

5.92GB

Training set

2.49GB

test set
APPLICATION

3D RECONSTRUCTION

POINT CLOUD
RECONSTRUCTED MESH
POINT CLOUD
RECONSTRUCTED MESH

LOD RECONSTRUCTION

Kinetic Shape Reconstruction

ORIGINAL MESH
KSR RESULT [6]

VITUAL SCENE DESIGN

DATASET PREVIEW