Update README.md
Browse files
README.md
CHANGED
|
@@ -337,6 +337,20 @@ if num_funcs >= 4:
|
|
| 337 |
|
| 338 |
---
|
| 339 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 340 |
## Roadmap
|
| 341 |
| Version | Goal |
|
| 342 |
|---------|------|
|
|
|
|
| 337 |
|
| 338 |
---
|
| 339 |
|
| 340 |
+
## Router Stability (Important)
|
| 341 |
+
|
| 342 |
+
Dynamic soft‑routing is powerful but sensitive. The training methodology is under active refinement to ensure healthy branch growth without premature specialization.
|
| 343 |
+
|
| 344 |
+
- Instability: routing entropy can drop early, leading to branch collapse. This is highly sensitive to temperature (τ) scheduling, entropy auxiliary weight (λ), and any warmup forcing.
|
| 345 |
+
- Current safeguards: high initial τ, extended freeze period, entropy‑max regularization, and selective forcing during early steps.
|
| 346 |
+
- Expectations: training curves may show transient oscillations in branch usage while the router and branches co‑adapt.
|
| 347 |
+
- What to monitor: `entropy_norm ≥ 0.75` in the first 3–5k steps; no branch persistently < 15%.
|
| 348 |
+
- Intervention playbook: increase `router_aux_weight`, extend `router_tau_freeze_steps`, temporarily raise `router_tau_start`, or apply targeted forcing to the weakest branch.
|
| 349 |
+
- Fine‑tuning note: if using the standard HF Trainer, consider `router_aux_weight=0` (or use `scripts/train_veronica.py`, which handles entropy‑max correctly).
|
| 350 |
+
- Status: ongoing refinement. Default τ/λ schedules may evolve; core API will remain stable.
|
| 351 |
+
|
| 352 |
+
---
|
| 353 |
+
|
| 354 |
## Roadmap
|
| 355 |
| Version | Goal |
|
| 356 |
|---------|------|
|