MhaWay commited on
Commit
cc29f95
·
verified ·
1 Parent(s): c418073

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -337,6 +337,20 @@ if num_funcs >= 4:
337
 
338
  ---
339
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
340
  ## Roadmap
341
  | Version | Goal |
342
  |---------|------|
 
337
 
338
  ---
339
 
340
+ ## Router Stability (Important)
341
+
342
+ Dynamic soft‑routing is powerful but sensitive. The training methodology is under active refinement to ensure healthy branch growth without premature specialization.
343
+
344
+ - Instability: routing entropy can drop early, leading to branch collapse. This is highly sensitive to temperature (τ) scheduling, entropy auxiliary weight (λ), and any warmup forcing.
345
+ - Current safeguards: high initial τ, extended freeze period, entropy‑max regularization, and selective forcing during early steps.
346
+ - Expectations: training curves may show transient oscillations in branch usage while the router and branches co‑adapt.
347
+ - What to monitor: `entropy_norm ≥ 0.75` in the first 3–5k steps; no branch persistently < 15%.
348
+ - Intervention playbook: increase `router_aux_weight`, extend `router_tau_freeze_steps`, temporarily raise `router_tau_start`, or apply targeted forcing to the weakest branch.
349
+ - Fine‑tuning note: if using the standard HF Trainer, consider `router_aux_weight=0` (or use `scripts/train_veronica.py`, which handles entropy‑max correctly).
350
+ - Status: ongoing refinement. Default τ/λ schedules may evolve; core API will remain stable.
351
+
352
+ ---
353
+
354
  ## Roadmap
355
  | Version | Goal |
356
  |---------|------|