| ## Public Benchmark Progress |
|
|
| Date: 2026-04-01 UTC |
|
|
| ### Confirmed Real Public Benchmark Result |
|
|
| - Public occlusion proxy: `ManiSkill PickClutterYCB-v1` |
| - Strongest adapter-specific result so far: |
| - summary: `/workspace/workspace/reports/maniskill_pickclutter_smoke_v5_eval_tuned_softerpref/public_benchmark_package_summary.json` |
| - `trunk_only_ft = 0.04` |
| - `adapter_noop = 0.04` |
| - `adapter_active_ft = 0.62` |
| - `delta_active_vs_trunk = +0.58` |
| - `95% CI = [0.44, 0.72]` |
| - `intervention_rate = 1.0` |
| - `non_base_selection_rate = 1.0` |
| - Interpretation: |
| - this is real adapter-specific sign of life on a public occlusion benchmark |
| - the gain is not coming from a stronger shared trunk, because `adapter_noop` stays flat |
|
|
| ### BEHAVIOR Bag Proxy Investigation |
|
|
| Target public task family: |
| - official BEHAVIOR grocery-store bag/container retrieval proxy |
| - primary candidate: `paying_for_purchases` |
| - stricter but currently unusable candidate: `buy_basic_garden_tools` |
|
|
| Environment used: |
| - BEHAVIOR assets: `/workspace/workspace/BEHAVIOR-1K` |
| - venv used for probes: `/workspace/envs/behavior` |
|
|
| Findings: |
| - `buy_basic_garden_tools` is blocked by official scene-task geometry: |
| - repeated failure on `ontop ['rake.n.03_1', 'grocery_shelf.n.01_1']` |
| - even with whitelist attempts, the sampler never found a valid shelf placement |
| - `paying_for_purchases` is much healthier: |
| - `grocery_store_convenience`, `grocery_store_cafe`, and `grocery_store_asian` all load |
| - object scope binds the real task objects: |
| - `shopping_basket.n.01_1` |
| - `money.n.01_1` |
| - `checkout.n.03_1` |
| - `floor.n.01_1` |
| - Root sampler bug: |
| - official online sampling fails on the floor / agent chain |
| - without patching, the blocking warning is: |
| - `Room type [grocery_store] ... floor.n.01_1: , checkout.n.03_1: grocery_store_0` |
| - after removing the agent-on-floor condition from the sampler pipeline, the next blocker is: |
| - `ontop ['shopping_basket.n.01_1', 'floor.n.01_1'] False` |
| - Critical state-probe result: |
| - even when object bindings exist, the sampled movable objects remain parked at their far-away import positions |
| - observed example on `grocery_store_asian`: |
| - basket position near `[120, 120, -80]` |
| - money position near `[115, 115, -85]` |
| - apples position near `[110, 110, -90]` and `[105, 105, -95]` |
| - `money inside basket = False` |
| - `apple1 inside basket = False` |
| - `apple2 inside basket = False` |
| - Conclusion: |
| - as of 2026-04-01, the BEHAVIOR bag proxy is not yet a usable fair evaluation track in this workspace |
| - the public task objects bind, but the online sampler does not materialize a valid initial scene for training or evaluation |
|
|
| ### Garment / Cloth Proxy Status |
|
|
| - GarmentLab repo cloned: |
| - `/workspace/workspace/GarmentLab` |
| - Immediate constraint: |
| - the repo expects Isaac Sim 4.0.0 plus external Google Drive assets |
| - Current status: |
| - code inspected only |
| - no runnable public cloth benchmark execution completed yet in this workspace |
|
|
| ### Next Public Proxy Candidates |
|
|
| Given the BEHAVIOR blocker, the next-lightest public candidates already available locally are: |
|
|
| - `OpenCabinetDrawer-v1` |
| - public ManiSkill task |
| - good container reveal / access proxy |
| - `PutEggplantInBasketScene-v1` |
| - public ManiSkill bridge-dataset task |
| - public basket / container interaction proxy |
| - `PutSpoonOnTableClothInScene-v1` |
| - public ManiSkill bridge-dataset cloth interaction proxy |
|
|
| ### Immediate Recommendation |
|
|
| - Keep the confirmed `PickClutterYCB-v1` result as the anchor public success case. |
| - Do not spend more time on BEHAVIOR online sampling until either: |
| - a cached valid scene instance is created, or |
| - the sampler is patched deeply enough to place container objects correctly instead of leaving them at far-away import positions. |
| - Pivot the next train/eval smoke to a lighter public ManiSkill proxy before returning to BEHAVIOR. |
|
|