RoboBPP

Repetitive Dataset

The Repetitive Dataset features consecutively repeated boxes observed in real logistics operations, where identical items are produced and packed in continuous sequences, similar to assembly-line manufacturing. It is designed to evaluate whether an algorithm can recognize repetitive patterns and maintain stable placement over time. Such repetition is common in industrial production and supply workflows.

This dataset contains 16,767 time-ordered records, each including item dimensions, volume, and timestamp information. It characterizes a repetitive packing scenario that emphasizes minimizing cumulative errors and ensuring stable stacking during continuous online packing. Testing algorithms on this dataset helps identify whether they can maintain performance in sequential, repetitive operations and handle long-term stability challenges.

Length	Width	Height	Vol/CS	Gross weight	Case_pick_finish_time
35	20.3	20.8	29.5568	6.324	2022/06/07, 00:24:37
33.7	19.1	17	10.94239	5.874	2022/06/07, 00:24:47
35.3	26.7	24.1	22.714491	7.983	2022/06/07, 00:24:59
35	20.3	20.8	29.5568	6.327	2022/06/07, 00:25:12

Diverse Dataset

The Diverse Dataset features a large variety of boxes with differences in size, aiming to evaluate an algorithm’s adaptability to highly diverse item distributions. Such heterogeneity is common in logistics order data, from which the proportion of different item categories can be derived to reflect realistic demand distributions. It comprises 6,849 office supply items with dimensions, item names, and proportion metrics. This dataset represents a diverse task that tests an algorithm’s ability to manage diverse items while optimizing both space utilization and stacking stability.

Testing algorithms on this dataset helps verify whether they can handle practical constraints, such as different priorities for high-occupancy items or restrictions on packing certain items together. The rich occupancy and category information allows researchers to assess how well algorithms adapt to heterogeneous distributions and maintain performance in realistic logistics scenarios.

	Length	Width	Height	Proportion
0	24.5	35.5	39	0.71
1	32.5	42.5	33.5	0.54
2	27.5	35.5	20	0.48
3	32	54	30.5	0.46

Long Board Dataset

The Long Board Dataset features boxes with one dimension significantly longer than the others, aiming to evaluate an algorithm’s ability to handle geometrically challenging shapes. Such items are common in manufacturing and furniture logistics, where their geometry complicates spatial arrangement and stability. This dataset represents a long-board–packing task requiring precise control of positioning and balance, providing a rigorous test of algorithm robustness under physically constrained conditions.

The dataset contains long-board items of various sizes, each record including length, width, and height information. Testing algorithms on this dataset helps assess their capability to optimize space utilization and maintain stacking stability when dealing with elongated materials, which are particularly difficult to load efficiently in practice.

	Length	Width	Height
0	62	35.5	13
1	273	26	6
2	85	61	12
3	151	62	8

Simulation Environment

Figure: Overview of three test settings

We define three test settings to evaluate algorithm adaptability under increasing levels of physical realism:

Math Pack: This setting performs purely geometric placement without physics or robot operation, allowing evaluation to focus solely on the algorithm’s spatial reasoning ability. By removing physical factors such as gravity, friction, and motion uncertainty, it reflects the algorithm’s idealized noise-free upper bound of packing performance.
Physics Pack: In this setting, physical effects such as gravity and collisions are enabled, but no robot is involved. It evaluates whether the placement strategy can withstand real-world physical constraints such as stacking stability, balance, and collision dynamics. This highlights the robustness of the algorithm under realistic conditions, without introducing additional complexities from motion planning or robot execution.
Execution Pack: This is the most realistic evaluation setting, integrating physics simulation and robotic execution, including motion planning and control. Performance in this setting depends on kinematic reachability, collision-free trajectory planning, and stability during execution. This end-to-end evaluation examines the full workflow—from placement planning to executable robot actions—and reflects readiness for real-world deployment.

These three settings form a tiered evaluation framework that allows us to analyze how different packing strategies perform under increasingly realistic constraints—from ideal geometric logic to full physical execution with robotics. This design helps reveal the robustness, adaptability, and real-world viability of each algorithm.

Our simulation environment is built with the PyBullet physics engine. The robotic arm used in Setting 3 is configured based on a 6-DOF UR5 model. During the simulation, the robot is required to perform the pick-and-place action from the loading area to the target position using inverse kinematics (IK) and motion planning.

We consider both geometric and physical constraints: geometrically, boxes must remain within the container without overlap; physically, we enforce gravity, friction, collision response, and ensure the robotic arm avoids self-collision and environmental obstacles.

The system is implemented in Python and supports both direct geometric simulation and full physics-based interaction. A unified interface encapsulates the simulation engine, robotic control, and evaluation metrics. Users can launch simulations with different settings via a single entry point, allowing convenient algorithm testing and scenario switching.

To enhance the reusability and scalability of our bin packing simulation environment, we have encapsulated the entire system into a Python package named packsim, now available on PyPI. Users can install it using the following command:

pip install packsim

After installation, the simulation process can be invoked in Python as follows:

from packsim import simulate

result = simulate({
    'setting': 1,
    'data': 'flat_long',
    'method': 'PCT',
    'test_data_config': 0,
    'gui': 1,
    'config': '/rl/pypi/default.yaml',
    'action_path': '/rl/pypi/action.json',
    'planning_time_path': '/rl/pypi/planning_time.json',
    'save_path': '/rl/pypi'
})

Figure: Illustration of the packsim simulation workflow

By calling the simulate function and passing a dictionary of parameters, users can execute a complete simulation process. The input parameters include the simulation setting, dataset name, test configuration ID, visualization option, path to the default configuration file, action sequence file, planning time log, and result save path. Upon completion, the system returns evaluation results including packing efficiency, decision trajectories, and execution time for subsequent algorithm benchmarking and performance analysis.

Before running the simulation, users need to generate action.json and planning_time.json using the Submit module, and configure the default.yaml file accordingly—especially the data path for testing.
You can click here to download sample files.

This module supports fast switching of algorithms and dataset configurations, making it a versatile tool for testing and comparing online bin packing algorithms. It offers high compatibility and strong visualization capabilities.

Metric

Summary of metrics used in online bin packing. The first column indicates the type of the metric. ^* indicates the two metrics that we propose: Collapsed Placement and Dangerous Operation.

Type	Metric	Weight	Description
Compactness	Space Utilization	0.35	Ratio placed item volume to the container volume.
Compactness	Occupancy	0.15	Ratio of placed item volume to occupied space.
Inference Efficiency	Decision Time	0.08	Average algorithmic decision time per placement.
Stability	Local Stability	0.07	Average distance between target and actual positions.
Stability	Static Stability	0.15	Rewards based on the linear and angular velocities of the box.
Execution Efficiency	Trajectory Length	0.08	Average end-effector trajectory length.
	Collapsed Placement^*	0.07	Fraction of placements that collapse.
	Dangerous Operation^*	0.05	Fraction of unsafe robotic actions.

These metrics form the basis of the scoring system described in the Benchmark page and are used across the Math Pack, Physics Pack, and Execution Pack with different combinations.