Data Overview

Repetitive Dataset

The Repetitive Dataset features consecutively repeated boxes observed in real logistics operations, where identical items are produced and packed in continuous sequences, similar to assembly-line manufacturing. It is designed to evaluate whether an algorithm can recognize repetitive patterns and maintain stable placement over time. Such repetition is common in industrial production and supply workflows.

This dataset contains 16,767 time-ordered records, each including item dimensions, volume, and timestamp information. It characterizes a repetitive packing scenario that emphasizes minimizing cumulative errors and ensuring stable stacking during continuous online packing. Testing algorithms on this dataset helps identify whether they can maintain performance in sequential, repetitive operations and handle long-term stability challenges.

Length Width Height Vol/CS Gross weight Case_pick_finish_time
35 20.3 20.8 29.5568 6.324 2022/06/07, 00:24:37
33.7 19.1 17 10.94239 5.874 2022/06/07, 00:24:47
35.3 26.7 24.1 22.714491 7.983 2022/06/07, 00:24:59
35 20.3 20.8 29.5568 6.327 2022/06/07, 00:25:12

Diverse Dataset

The Diverse Dataset features a large variety of boxes with differences in size, aiming to evaluate an algorithm’s adaptability to highly diverse item distributions. Such heterogeneity is common in logistics order data, from which the proportion of different item categories can be derived to reflect realistic demand distributions. It comprises 6,849 office supply items with dimensions, item names, and proportion metrics. This dataset represents a diverse task that tests an algorithm’s ability to manage diverse items while optimizing both space utilization and stacking stability.

Testing algorithms on this dataset helps verify whether they can handle practical constraints, such as different priorities for high-occupancy items or restrictions on packing certain items together. The rich occupancy and category information allows researchers to assess how well algorithms adapt to heterogeneous distributions and maintain performance in realistic logistics scenarios.

Length Width Height Proportion
0 24.5 35.5 39 0.71
1 32.5 42.5 33.5 0.54
2 27.5 35.5 20 0.48
3 32 54 30.5 0.46

Long Board Dataset

The Long Board Dataset features boxes with one dimension significantly longer than the others, aiming to evaluate an algorithm’s ability to handle geometrically challenging shapes. Such items are common in manufacturing and furniture logistics, where their geometry complicates spatial arrangement and stability. This dataset represents a long-board–packing task requiring precise control of positioning and balance, providing a rigorous test of algorithm robustness under physically constrained conditions.

The dataset contains long-board items of various sizes, each record including length, width, and height information. Testing algorithms on this dataset helps assess their capability to optimize space utilization and maintain stacking stability when dealing with elongated materials, which are particularly difficult to load efficiently in practice.

Length Width Height
0 62 35.5 13
1 273 26 6
2 85 61 12
3 151 62 8

Simulation Environment

Simulation Settings Overview

Figure: Overview of three test settings

We define three test settings to evaluate algorithm adaptability under increasing levels of physical realism:

These three settings form a tiered evaluation framework that allows us to analyze how different packing strategies perform under increasingly realistic constraints—from ideal geometric logic to full physical execution with robotics. This design helps reveal the robustness, adaptability, and real-world viability of each algorithm.

Our simulation environment is built with the PyBullet physics engine. The robotic arm used in Setting 3 is configured based on a 6-DOF UR5 model. During the simulation, the robot is required to perform the pick-and-place action from the loading area to the target position using inverse kinematics (IK) and motion planning.

We consider both geometric and physical constraints: geometrically, boxes must remain within the container without overlap; physically, we enforce gravity, friction, collision response, and ensure the robotic arm avoids self-collision and environmental obstacles.

The system is implemented in Python and supports both direct geometric simulation and full physics-based interaction. A unified interface encapsulates the simulation engine, robotic control, and evaluation metrics. Users can launch simulations with different settings via a single entry point, allowing convenient algorithm testing and scenario switching.

To enhance the reusability and scalability of our bin packing simulation environment, we have encapsulated the entire system into a Python package named packsim, now available on PyPI. Users can install it using the following command:

pip install packsim

After installation, the simulation process can be invoked in Python as follows:

from packsim import simulate result = simulate({ 'setting': 1, 'data': 'flat_long', 'method': 'PCT', 'test_data_config': 0, 'gui': 1, 'config': '/rl/pypi/default.yaml', 'action_path': '/rl/pypi/action.json', 'planning_time_path': '/rl/pypi/planning_time.json', 'save_path': '/rl/pypi' })
packsim usage illustration

Figure: Illustration of the packsim simulation workflow

By calling the simulate function and passing a dictionary of parameters, users can execute a complete simulation process. The input parameters include the simulation setting, dataset name, test configuration ID, visualization option, path to the default configuration file, action sequence file, planning time log, and result save path. Upon completion, the system returns evaluation results including packing efficiency, decision trajectories, and execution time for subsequent algorithm benchmarking and performance analysis.

Before running the simulation, users need to generate action.json and planning_time.json using the Submit module, and configure the default.yaml file accordingly—especially the data path for testing.
You can click here to download sample files.

This module supports fast switching of algorithms and dataset configurations, making it a versatile tool for testing and comparing online bin packing algorithms. It offers high compatibility and strong visualization capabilities.

Metric

Summary of metrics used in online bin packing. The first column indicates the type of the metric. * indicates the two metrics that we propose: Collapsed Placement and Dangerous Operation.

Type Metric Weight Description
Compactness Space Utilization 0.35 Ratio placed item volume to the container volume.
Occupancy 0.15 Ratio of placed item volume to occupied space.
Inference Efficiency Decision Time 0.08 Average algorithmic decision time per placement.
Stability Local Stability 0.07 Average distance between target and actual positions.
Static Stability 0.15 Rewards based on the linear and angular velocities of the box.
Execution Efficiency Trajectory Length 0.08 Average end-effector trajectory length.
Collapsed Placement* 0.07 Fraction of placements that collapse.
Dangerous Operation* 0.05 Fraction of unsafe robotic actions.

These metrics form the basis of the scoring system described in the Benchmark page and are used across the Math Pack, Physics Pack, and Execution Pack with different combinations.