Spaces:
Sleeping
Sleeping
08 β Hour-by-hour Timeline
48-hour sprint. Hard deadline: Apr 26, 5:00 PM IST.
Day 1 β Saturday, Apr 25
7:00β9:00 AM β Reporting & Opening (mandatory attendance)
- 7:00 AM check-in at Scaler Electronic City campus
- Breakfast (until 9 AM)
- Opening ceremony β listen for: Cursor credits, HF credits ($30?), final theme clarifications
9:00β10:00 AM β Setup
- Both team members:
- Verify Colab account works, T4 reachable
- Verify HF account, get on-site credits redeemed
- Run
hf auth login+hf jobs hardwareto confirmt4-smallaccess works (per opening-deck p.78-80) - Verify GitHub access
- Anurag: create empty
clarify-rlHF Space (CPU basic, public, Docker) - Kanan: create empty
clarify-rlGitHub repo
10:00 AMβ12:00 PM β Phase 1: Scaffold β DONE
- All scaffold files in
clarify-rl/:- openenv.yaml, pyproject.toml, Dockerfile, init.py, models.py, client.py, server/init.py
- β Done. Move directly to Phase 2.
12:00β2:30 PM β Phase 2: Core environment (Anurag-led) β DONE
server/scenarios.pyβ procedural scenario generationserver/user_simulator.pyβ rule-based QβAserver/clarify_environment.pyβ MCPEnvironment subclass with 3 tools + step_asyncserver/app.pyβ FastAPI app viacreate_app(...)- Local smoke test:
uvicorn server.app:app+ all 3 smoke scripts pass - Lunch can be eaten while building (ladoos, biryani provided)
2:30β4:00 PM β Phase 3: Rubrics + per-step grader (Anurag-led) β DONE
server/rubrics.pyβ 5 Rubric subclasses, composed via Sequential + Gate + WeightedSumserver/grader.pyβ per-step shaping reward + plan parser- Wire rubric into
ClarifyEnvironment.__init__ - Verified: oracle policy β ~0.89, random β low, blank plan β 0
4:00β5:30 PM β Phase 4: Deploy + baseline eval β NEXT
- Push to HF Space, verify public access incognito
inference.pyβ baseline eval script (Qwen2.5-1.5B-Instruct via HF Inference API)- Run baseline on 100 held-out scenarios β save
outputs/baseline.json - Generate
scenarios/eval_held_out.json(frozen seeds 10000-10099)
5:30β7:00 PM β Phase 5: Training notebook (Kanan-led, in parallel)
While Anurag builds env: Kanan forks openenv_wordle_grpo.ipynb (TRL official, opening-deck p.73) and adapts it.
training/train_grpo.ipynbβ fork ofopenenv_wordle_grpo.ipynbβ swap env client toClarifyClient, swap reward fn to ours- Verify on Colab T4 first; switch to HF Jobs
t4-smallonce smoke run passes - Mock rollout with dummy env until real env is ready
7:00β9:00 PM β Phase 6: First real training run
- Dinner overlap (gulab jamun expected)
- Run 100 GRPO steps (Colab T4 first, HF Jobs
t4-smallonce stable) - Inspect reward curve: should trend up
- Eyeball 4 random rollouts (sample-inspection cadence per
06-training-plan.md) β reward up but degenerate generations = reward hacking, stop & fix - If flat: debug (likely rubric or rollout parsing)
9:00β11:00 PM β Phase 7: Iterate
- If first run looked good: continue with 300 more steps
- If reward hacking observed: tighten rubric, restart
- Generate first preliminary plots
11:00 PMβ1:00 AM β Phase 8: Overnight prep
- Start the LONG training run (600 steps)
- One person monitors checkpoints, the other sleeps in shifts
- Begin README.md draft (use 07-deployment.md as template)
Day 2 β Sunday, Apr 26
1:00β7:00 AM β Sleep shifts + monitoring
- Long run completes (~90min on T4) β if Colab kills session, resume from checkpoint
- Both should have at least 4-5 hours rest
7:00β9:00 AM β Phase 9: Final eval + plots
- Run trained model on held-out 100 scenarios
- Save
outputs/trained.json - Generate ALL 5 plots: reward_curve, loss_curve, per_task_bars, per_component, eval_dist
- Commit
plots/*.pngto repo
9:00β11:00 AM β Phase 10: Demo content (Kanan-led)
- Record demo screen captures (baseline vs trained traces)
- Edit 2-min video (OBS β ffmpeg β upload YouTube unlisted)
- OR write
blog.mdHF blog post (parallel option) - Both ideally β videos pop in pitches
11:00 AMβ1:00 PM β Phase 11: Polish + README
- Finalize root
README.mdwith:- Pitch + headline metric
- Embedded plots with captions
- Result table
- Architecture diagram (mermaid or ASCII)
- All 4 deliverable links
- Polish blog.md if writing one
- Final commit
1:00β3:00 PM β Phase 12: Final validation sweep
- Run through
docs/07-deployment.mdvalidation checklist (every line) - Test HF Space from incognito
- Re-run Colab notebook end-to-end on fresh runtime
- Verify all README links open correctly
- Fix any breakage
3:00β4:30 PM β Phase 13: Submit
- Fill Google Form with all 4 URLs
- Double-check each URL works from incognito
- Confirm submission
4:30β5:00 PM β BUFFER (do not skip)
- Re-verify submission record
- Take screenshots of submission confirmation
- Final coffee. Breathe.
5:00 PM β DEADLINE π¨
No more changes accepted under any circumstances.
5:00 PM onward β Closing & networking
Team Split β Bhole Chature
| Phase | Anurag | Kanan |
|---|---|---|
| 1 | scaffold review | scaffold review |
| 2 | core env (clarify_environment.py) | scenarios.py + user_simulator.py |
| 3 | rubrics.py + grader.py | unit tests for env + rubric |
| 4 | HF Space deploy | inference.py + baseline eval |
| 5 | bug-fix env on issues | train_grpo.ipynb (Colab) |
| 6 | monitor first training run | inspect outputs, sample logs |
| 7 | iterate on env/rubric | iterate on training config |
| 8 | overnight checkpoint mgmt | sleep shift A |
| 9 | sleep shift B | final eval + plots |
| 10 | README + repo polish | demo video + blog.md |
| 11 | architecture writeup | submission link audit |
| 12 | final validation sweep | final validation sweep |
| 13 | hits submit | screenshot + confirm |
Both pair on critical risk items: training debug, validation sweep.
Anti-bus-factor Rule
Both team members must know how to:
- Push to HF Space + GitHub
- Restart the Colab runtime
- Run the env locally
- Run the eval script
If one is stuck/AFK at submission time, the other should be able to ship.
Energy Management
- Snack/coffee every 2 hours
- 5-min walks each hour
- No alcohol/Red Bull spirals
- Hydration > caffeine