To provide a unified evaluation across diverse metrics, RoboBPP adopts a normalized scoring system. All metrics are converted to a common scale using min–max normalization, ensuring that higher scores consistently represent better performance. After normalization, the final score of an algorithm is computed as a weighted sum of all normalized metrics, where the weights reflect the relative importance of each metric. The detailed definitions of all metrics can be found in the Documentation at Metric.
In the Execution Pack, eight metrics are included, and the weight vector is:
wexec = (0.35, 0.15, 0.08, 0.07, 0.15, 0.08, 0.07, 0.05).
This design emphasizes space utilization while accounting for stability, safety, and computational efficiency.
The Physics Pack evaluates five metrics, excluding trajectory and safety-related measures.
The corresponding normalized weight vector becomes:
wphys = (0.43, 0.19, 0.10, 0.09, 0.19).
The Math Pack focuses on three geometry-based metrics, with the weight vector:
wmath = (0.60, 0.26, 0.14).
We aggregated the results of the experiments using the scoring system to compute an overall score for each algorithm, which allows us to rank all methods across different test settings and datasets. The top-4 results are highlighted as first, second, third, and fourth.
| Setting | Math Pack | Physics Pack | Execution Pack | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Dataset | Repetitive | Diverse | Long Board | Repetitive | Diverse | Long Board | Repetitive | Diverse | Long Board |
| PCT | 0.908 | 0.939 | 0.971 | 0.810 | 0.714 | 0.839 | 0.765 | 0.672 | 0.785 |
| TAP-Net++ | 0.797 | 0.387 | 0.815 | 0.891 | 0.502 | 0.725 | 0.781 | 0.603 | 0.756 |
| AR2L | 0.914 | 0.878 | 0.824 | 0.740 | 0.611 | 0.694 | 0.708 | 0.737 | 0.688 |
| PackE | 0.397 | 0.440 | 0.207 | 0.865 | 0.623 | 0.209 | 0.617 | 0.469 | 0.300 |
| CDRL | 0.468 | 0.638 | 0.655 | 0.491 | 0.453 | 0.612 | 0.422 | 0.728 | 0.637 |
| DBL | 0.854 | 0.763 | 0.879 | 0.850 | 0.476 | 0.816 | 0.807 | 0.713 | 0.777 |
| LSAH | 0.863 | 0.794 | 0.679 | 0.861 | 0.477 | 0.510 | 0.701 | 0.523 | 0.487 |
| HM | 0.823 | 0.627 | 0.609 | 0.709 | 0.554 | 0.683 | 0.414 | 0.623 | 0.654 |
| SDFPack | 0.659 | 0.579 | 0.087 | 0.389 | 0.345 | 0.193 | 0.232 | 0.228 | 0.229 |
| OnlineBPH | 0.527 | 0.534 | 0.583 | 0.517 | 0.478 | 0.560 | 0.464 | 0.618 | 0.574 |
| MACS | 0.221 | 0.135 | 0.009 | 0.298 | 0.183 | 0.236 | 0.238 | 0.364 | 0.293 |
| BR | 0.781 | 0.677 | 0.396 | 0.542 | 0.330 | 0.291 | 0.410 | 0.360 | 0.295 |
This setting performs purely geometric placement without any physics simulation or robot execution. By removing gravity, friction, and motion uncertainty, it isolates the algorithm’s spatial reasoning ability and reflects an idealized, noise-free upper bound of packing performance.
In this setting, gravity, collisions, and other physical effects are enabled, but robot motion is not involved. It evaluates whether the algorithm’s placement strategy can remain stable under realistic physical constraints such as stacking balance and collision dynamics, highlighting robustness without introducing motion-planning complexity.
This highest-fidelity setting integrates both physical simulation and robotic execution, including motion planning and trajectory control. Algorithm performance depends on kinematic reachability, collision-free path planning, and execution stability. This end-to-end evaluation reflects how well the algorithm’s placements can be realistically carried out by a robot in industrial environments.