TAP-Net: Transport-and-Pack using Reinforcement Learning

ACM Transactions on Graphics (Proceedings of SIGGRAPH ASIA 2020)

Ruizhen Hu1   Juzhan Xu1   Bin Chen1   Minglun Gong2   Hao Zhang3   Hui Huang1*

1Shenzhen University   2University of Guelph  3Simon Fraser University

Fig. 1. Given an initial spatial configuration of boxes (a), our neural network, TAP-Net, iteratively transports and packs (b) the boxes compactly into a target container (c). TAP-Net is trained to output a packing order and box orientations (90° rotations are allowed; see the orange box with the SIGGRAPH Logo) through reinforcement learning, under obstruction and accessibility  constraints, and with rewards designated to facilitate compact packing. 


We introduce the transport-and-pack (TAP) problem, a frequently encountered instance of real-world packing, and develop a neural optimization solution based on reinforcement learning. Given an initial spatial configuration of boxes, we seek an efficient method to iteratively transport and pack the boxes compactly into a target container. Due to obstruction and accessibility constraints, our problem has to add a new search dimension, i.e., finding an optimal transport sequence, to the already immense search space for packing alone. Using a learning-based approach, a trained network can learn and encode solution patterns to guide the solution of new problem instances instead of executing an expensive online search. In our work, we represent the transport constraints using a precedence graph and train a neural network, coined TAP-Net, using reinforcement learning to reward efficient and stable packing. The network is built on an encoder-decoder architecture, where the encoder employs convolution layers to encode the box geometry and precedence graph and the decoder is a recurrent neural network (RNN) which inputs the current encoder output, as well as the current box packing state of the target container, and outputs the next box to pack, as well as its orientation. We train our network on randomly generated initial box configurations, without supervision, via policy gradients to learn optimal TAP policies to maximize packing efficiency and stability. We demonstrate the performance of TAP-Net on a variety of examples, evaluating the network through ablation studies and comparisons to baselines and alternative network designs. We also show that our network generalizes well to larger problem instances, when trained on small-sized inputs.

Fig. 2. Overview of our TAP framework, illustrated using a 2D example. Given an initial layout of boxes (a), we first analyze the transport constraints (i.e., obstructions and accessibility) which would dictate packing orders. For example, TB means “Top Blocking” of box D by box E, so D can only be packed after E; see more details in Figure 4 and Section 3. We represent these constraints in a precedence graph (b). The precedence graph and the initial state of the target container are fed into the network TAP-Net to determine which box should be packed first (c). Updated precedence graph and packing state of the target container are iteratively fed to the network to determine subsequent packing order until all boxes are packed (d).

Fig. 3. The architecture of TAP-Net. Two modules, an encoder and a decoder, are trained to collectively compute a probability map over all valid states under each step using an attention mechanism. The state with maximum probability (s1 in this case) is selected as output for the current step, and used to upated all dynamic information and item mask.

Fig. 4. Top and side blocking illustrated in 2D. (a) TB: top blocking of box D by E, shown as black solid directed edge in precedence graph (left). (b) LAB: left access block of box B by E. (c) RAB: right access block of box B by C.

Fig. 6. The encoding of box states. Each box is encoded under two states (rows) for original (solid) and rotated (stripe pattern) orientations. Besides static width and height information, each state also records the dynamic dependency information using binary codes. Each bit in the binary code represents one of the objects. For example, Object A is blocked by objects D and E from the top and hence the values at the corresponding bits in TB are set to ‘1’.

Fig. 11. Rolling-based packing for handling large instance set. When TAP-Net is trained for 4 objects and there are 6 objects to be packed, we first select 4 objects into the initial set Ω, based on which TAP-Net chose the first object to be packed (C in this case). After C is packed and removed, another object (F in this case) is added to Ω. The process repeats until all objects are packed.

Fig. 19. Packing results under different initial container widths. The packing qualities are almost identical when width is larger than or equal to 6.

Fig. 23. Example results on 3D cases: two from the RAND set (top) and two from the PPSG set (bottom).

Data & Code

Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.



We thank the anonymous reviewers for their valuable comments. This work was supported in part by NSFC (61872250, 61861130365, 61761146002), GD Talent Program (2019JC05X328), GD Science and Technology Program (2020A0505100064, 2018KZDXM058), LHTD (20170003), NSERC Canada (611370, 293127), gift funds from Adobe, a Google Faculty Research Award, National Engineering Laboratory for Big Data System Computing Technology, and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ).



title = {TAP-Net: Transport-and-Pack using Reinforcement Learning},

author = {Ruizhen Hu and Juzhan Xu and Bin Chen and Minglun Gong and Hao Zhang and Hui Huang},

journal = {ACM Transactions on Graphics (Proceedings of SIGGRAPH ASIA 2020)},

volume = {39},

number = {6},

pages = {232:1--232:15},

year = {2020},


Downloads(faster for people in China)

Downloads(faster for people in other places)