Spaces:
Sleeping
Sleeping
| # 08 β Hour-by-hour Timeline | |
| 48-hour sprint. Hard deadline: **Apr 26, 5:00 PM IST**. | |
| ## Day 1 β Saturday, Apr 25 | |
| ### 7:00β9:00 AM β Reporting & Opening (mandatory attendance) | |
| - 7:00 AM check-in at Scaler Electronic City campus | |
| - Breakfast (until 9 AM) | |
| - Opening ceremony β listen for: Cursor credits, HF credits ($30?), final theme clarifications | |
| ### 9:00β10:00 AM β Setup | |
| - Both team members: | |
| - Verify Colab account works, T4 reachable | |
| - Verify HF account, get on-site credits redeemed | |
| - **Run `hf auth login` + `hf jobs hardware`** to confirm `t4-small` access works (per opening-deck p.78-80) | |
| - Verify GitHub access | |
| - Anurag: create empty `clarify-rl` HF Space (CPU basic, public, Docker) | |
| - Kanan: create empty `clarify-rl` GitHub repo | |
| ### 10:00 AMβ12:00 PM β Phase 1: Scaffold β DONE | |
| - All scaffold files in `clarify-rl/`: | |
| - openenv.yaml, pyproject.toml, Dockerfile, __init__.py, models.py, client.py, server/__init__.py | |
| - β Done. Move directly to Phase 2. | |
| ### 12:00β2:30 PM β Phase 2: Core environment (Anurag-led) β DONE | |
| - `server/scenarios.py` β procedural scenario generation | |
| - `server/user_simulator.py` β rule-based QβA | |
| - `server/clarify_environment.py` β MCPEnvironment subclass with 3 tools + step_async | |
| - `server/app.py` β FastAPI app via `create_app(...)` | |
| - Local smoke test: `uvicorn server.app:app` + all 3 smoke scripts pass | |
| - Lunch can be eaten while building (ladoos, biryani provided) | |
| ### 2:30β4:00 PM β Phase 3: Rubrics + per-step grader (Anurag-led) β DONE | |
| - `server/rubrics.py` β 5 Rubric subclasses, composed via Sequential + Gate + WeightedSum | |
| - `server/grader.py` β per-step shaping reward + plan parser | |
| - Wire rubric into `ClarifyEnvironment.__init__` | |
| - Verified: oracle policy β ~0.89, random β low, blank plan β 0 | |
| ### 4:00β5:30 PM β Phase 4: Deploy + baseline eval β NEXT | |
| - Push to HF Space, verify public access incognito | |
| - `inference.py` β baseline eval script (Qwen2.5-1.5B-Instruct via HF Inference API) | |
| - Run baseline on 100 held-out scenarios β save `outputs/baseline.json` | |
| - Generate `scenarios/eval_held_out.json` (frozen seeds 10000-10099) | |
| ### 5:30β7:00 PM β Phase 5: Training notebook (Kanan-led, in parallel) | |
| While Anurag builds env: Kanan **forks `openenv_wordle_grpo.ipynb`** (TRL official, opening-deck p.73) and adapts it. | |
| - `training/train_grpo.ipynb` β fork of `openenv_wordle_grpo.ipynb` β swap env client to `ClarifyClient`, swap reward fn to ours | |
| - Verify on Colab T4 first; switch to HF Jobs `t4-small` once smoke run passes | |
| - Mock rollout with dummy env until real env is ready | |
| ### 7:00β9:00 PM β Phase 6: First real training run | |
| - Dinner overlap (gulab jamun expected) | |
| - Run 100 GRPO steps (Colab T4 first, HF Jobs `t4-small` once stable) | |
| - Inspect reward curve: should trend up | |
| - **Eyeball 4 random rollouts** (sample-inspection cadence per `06-training-plan.md`) β reward up but degenerate generations = reward hacking, stop & fix | |
| - If flat: debug (likely rubric or rollout parsing) | |
| ### 9:00β11:00 PM β Phase 7: Iterate | |
| - If first run looked good: continue with 300 more steps | |
| - If reward hacking observed: tighten rubric, restart | |
| - Generate first preliminary plots | |
| ### 11:00 PMβ1:00 AM β Phase 8: Overnight prep | |
| - Start the LONG training run (600 steps) | |
| - One person monitors checkpoints, the other sleeps in shifts | |
| - Begin README.md draft (use 07-deployment.md as template) | |
| ## Day 2 β Sunday, Apr 26 | |
| ### 1:00β7:00 AM β Sleep shifts + monitoring | |
| - Long run completes (~90min on T4) β if Colab kills session, resume from checkpoint | |
| - Both should have at least 4-5 hours rest | |
| ### 7:00β9:00 AM β Phase 9: Final eval + plots | |
| - Run trained model on held-out 100 scenarios | |
| - Save `outputs/trained.json` | |
| - Generate ALL 5 plots: reward_curve, loss_curve, per_task_bars, per_component, eval_dist | |
| - Commit `plots/*.png` to repo | |
| ### 9:00β11:00 AM β Phase 10: Demo content (Kanan-led) | |
| - Record demo screen captures (baseline vs trained traces) | |
| - Edit 2-min video (OBS β ffmpeg β upload YouTube unlisted) | |
| - OR write `blog.md` HF blog post (parallel option) | |
| - Both ideally β videos pop in pitches | |
| ### 11:00 AMβ1:00 PM β Phase 11: Polish + README | |
| - Finalize root `README.md` with: | |
| - Pitch + headline metric | |
| - Embedded plots with captions | |
| - Result table | |
| - Architecture diagram (mermaid or ASCII) | |
| - All 4 deliverable links | |
| - Polish blog.md if writing one | |
| - Final commit | |
| ### 1:00β3:00 PM β Phase 12: Final validation sweep | |
| - Run through `docs/07-deployment.md` validation checklist (every line) | |
| - Test HF Space from incognito | |
| - Re-run Colab notebook end-to-end on fresh runtime | |
| - Verify all README links open correctly | |
| - Fix any breakage | |
| ### 3:00β4:30 PM β Phase 13: Submit | |
| - Fill Google Form with all 4 URLs | |
| - Double-check each URL works from incognito | |
| - Confirm submission | |
| ### 4:30β5:00 PM β BUFFER (do not skip) | |
| - Re-verify submission record | |
| - Take screenshots of submission confirmation | |
| - Final coffee. Breathe. | |
| ### 5:00 PM β DEADLINE π¨ | |
| No more changes accepted under any circumstances. | |
| ### 5:00 PM onward β Closing & networking | |
| ## Team Split β Bhole Chature | |
| | Phase | Anurag | Kanan | | |
| |-------|--------|-------| | |
| | 1 | scaffold review | scaffold review | | |
| | 2 | core env (clarify_environment.py) | scenarios.py + user_simulator.py | | |
| | 3 | rubrics.py + grader.py | unit tests for env + rubric | | |
| | 4 | HF Space deploy | inference.py + baseline eval | | |
| | 5 | bug-fix env on issues | train_grpo.ipynb (Colab) | | |
| | 6 | monitor first training run | inspect outputs, sample logs | | |
| | 7 | iterate on env/rubric | iterate on training config | | |
| | 8 | overnight checkpoint mgmt | sleep shift A | | |
| | 9 | sleep shift B | final eval + plots | | |
| | 10 | README + repo polish | demo video + blog.md | | |
| | 11 | architecture writeup | submission link audit | | |
| | 12 | final validation sweep | final validation sweep | | |
| | 13 | hits submit | screenshot + confirm | | |
| Both pair on critical risk items: training debug, validation sweep. | |
| ## Anti-bus-factor Rule | |
| **Both team members must know how to:** | |
| - Push to HF Space + GitHub | |
| - Restart the Colab runtime | |
| - Run the env locally | |
| - Run the eval script | |
| If one is stuck/AFK at submission time, the other should be able to ship. | |
| ## Energy Management | |
| - Snack/coffee every 2 hours | |
| - 5-min walks each hour | |
| - No alcohol/Red Bull spirals | |
| - Hydration > caffeine | |