Clarify y_ready caveat and oven task structure
Browse filesState that y_ready should not be treated as a decisive metric for the current oven benchmark and that the oven phase handoff is highly structured, so this task is mainly a smoke test / base-finetune comparison rather than strong evidence of general reveal-and-retrieve reasoning.
README.md
CHANGED
|
@@ -77,7 +77,9 @@ The main new fix in `iter24` is the assisted-door contact scoring inside `p_pre`
|
|
| 77 |
|
| 78 |
The current repo state should therefore be treated as the repaired benchmark snapshot with geometry-aware door assistance, not the final metric design.
|
| 79 |
|
| 80 |
-
Brief caveat: the current `y_ready` label still gates on low oven-door angular speed after extraction feasibility persists. In this task, the retriever arm can legitimately nudge the door while already committing to retrieval, so `y_ready` can still switch later than the true reveal-to-retrieve boundary.
|
|
|
|
|
|
|
| 81 |
|
| 82 |
## What Is In This Upload
|
| 83 |
|
|
|
|
| 77 |
|
| 78 |
The current repo state should therefore be treated as the repaired benchmark snapshot with geometry-aware door assistance, not the final metric design.
|
| 79 |
|
| 80 |
+
Brief caveat: the current `y_ready` label still gates on low oven-door angular speed after extraction feasibility persists. In this task, the retriever arm can legitimately nudge the door while already committing to retrieval, so `y_ready` can still switch later than the true reveal-to-retrieve boundary. For the current oven benchmark, `y_ready` should therefore not be treated as a decisive validation metric or a trusted phase-switch target.
|
| 81 |
+
|
| 82 |
+
The oven task also has a highly structured reveal-to-retrieve handoff in the expert demos: both arms reposition, the revealer opens and clears the door, then the retriever commits. Because that phase pattern is so standardized, good results on this task are most useful as a task-specific smoke test or a "does the adaptor beat a base finetune here?" check, not as strong evidence of general reveal-and-retrieve reasoning.
|
| 83 |
|
| 84 |
## What Is In This Upload
|
| 85 |
|