Spaces:

agarwalanu3103
/

clarify-rl

Sleeping

App Files Files Community

clarify-rl / docs /08-timeline.md

Anurag Agarwal

ClarifyRL: initial HF Space deploy

2414d31 about 1 month ago

preview code

raw

history blame contribute delete

6.41 kB

	# 08 — Hour-by-hour Timeline

	48-hour sprint. Hard deadline: Apr 26, 5:00 PM IST.

	## Day 1 — Saturday, Apr 25

	### 7:00–9:00 AM — Reporting & Opening (mandatory attendance)

	- 7:00 AM check-in at Scaler Electronic City campus
	- Breakfast (until 9 AM)
	- Opening ceremony — listen for: Cursor credits, HF credits ($30?), final theme clarifications

	### 9:00–10:00 AM — Setup

	- Both team members:
	- Verify Colab account works, T4 reachable
	- Verify HF account, get on-site credits redeemed
	- Run `hf auth login` + `hf jobs hardware` to confirm `t4-small` access works (per opening-deck p.78-80)
	- Verify GitHub access
	- Anurag: create empty `clarify-rl` HF Space (CPU basic, public, Docker)
	- Kanan: create empty `clarify-rl` GitHub repo

	### 10:00 AM–12:00 PM — Phase 1: Scaffold ✅ DONE

	- All scaffold files in `clarify-rl/`:
	- openenv.yaml, pyproject.toml, Dockerfile, __init__.py, models.py, client.py, server/__init__.py
	- ✅ Done. Move directly to Phase 2.

	### 12:00–2:30 PM — Phase 2: Core environment (Anurag-led) ✅ DONE

	- `server/scenarios.py` — procedural scenario generation
	- `server/user_simulator.py` — rule-based Q→A
	- `server/clarify_environment.py` — MCPEnvironment subclass with 3 tools + step_async
	- `server/app.py` — FastAPI app via `create_app(...)`
	- Local smoke test: `uvicorn server.app:app` + all 3 smoke scripts pass
	- Lunch can be eaten while building (ladoos, biryani provided)

	### 2:30–4:00 PM — Phase 3: Rubrics + per-step grader (Anurag-led) ✅ DONE

	- `server/rubrics.py` — 5 Rubric subclasses, composed via Sequential + Gate + WeightedSum
	- `server/grader.py` — per-step shaping reward + plan parser
	- Wire rubric into `ClarifyEnvironment.__init__`
	- Verified: oracle policy → ~0.89, random → low, blank plan → 0

	### 4:00–5:30 PM — Phase 4: Deploy + baseline eval ← NEXT

	- Push to HF Space, verify public access incognito
	- `inference.py` — baseline eval script (Qwen2.5-1.5B-Instruct via HF Inference API)
	- Run baseline on 100 held-out scenarios → save `outputs/baseline.json`
	- Generate `scenarios/eval_held_out.json` (frozen seeds 10000-10099)

	### 5:30–7:00 PM — Phase 5: Training notebook (Kanan-led, in parallel)

	While Anurag builds env: Kanan forks `openenv_wordle_grpo.ipynb` (TRL official, opening-deck p.73) and adapts it.
	- `training/train_grpo.ipynb` — fork of `openenv_wordle_grpo.ipynb` → swap env client to `ClarifyClient`, swap reward fn to ours
	- Verify on Colab T4 first; switch to HF Jobs `t4-small` once smoke run passes
	- Mock rollout with dummy env until real env is ready

	### 7:00–9:00 PM — Phase 6: First real training run

	- Dinner overlap (gulab jamun expected)
	- Run 100 GRPO steps (Colab T4 first, HF Jobs `t4-small` once stable)
	- Inspect reward curve: should trend up
	- Eyeball 4 random rollouts (sample-inspection cadence per `06-training-plan.md`) — reward up but degenerate generations = reward hacking, stop & fix
	- If flat: debug (likely rubric or rollout parsing)

	### 9:00–11:00 PM — Phase 7: Iterate

	- If first run looked good: continue with 300 more steps
	- If reward hacking observed: tighten rubric, restart
	- Generate first preliminary plots

	### 11:00 PM–1:00 AM — Phase 8: Overnight prep

	- Start the LONG training run (600 steps)
	- One person monitors checkpoints, the other sleeps in shifts
	- Begin README.md draft (use 07-deployment.md as template)

	## Day 2 — Sunday, Apr 26

	### 1:00–7:00 AM — Sleep shifts + monitoring

	- Long run completes (~90min on T4) → if Colab kills session, resume from checkpoint
	- Both should have at least 4-5 hours rest

	### 7:00–9:00 AM — Phase 9: Final eval + plots

	- Run trained model on held-out 100 scenarios
	- Save `outputs/trained.json`
	- Generate ALL 5 plots: reward_curve, loss_curve, per_task_bars, per_component, eval_dist
	- Commit `plots/*.png` to repo

	### 9:00–11:00 AM — Phase 10: Demo content (Kanan-led)

	- Record demo screen captures (baseline vs trained traces)
	- Edit 2-min video (OBS → ffmpeg → upload YouTube unlisted)
	- OR write `blog.md` HF blog post (parallel option)
	- Both ideally — videos pop in pitches

	### 11:00 AM–1:00 PM — Phase 11: Polish + README

	- Finalize root `README.md` with:
	- Pitch + headline metric
	- Embedded plots with captions
	- Result table
	- Architecture diagram (mermaid or ASCII)
	- All 4 deliverable links
	- Polish blog.md if writing one
	- Final commit

	### 1:00–3:00 PM — Phase 12: Final validation sweep

	- Run through `docs/07-deployment.md` validation checklist (every line)
	- Test HF Space from incognito
	- Re-run Colab notebook end-to-end on fresh runtime
	- Verify all README links open correctly
	- Fix any breakage

	### 3:00–4:30 PM — Phase 13: Submit

	- Fill Google Form with all 4 URLs
	- Double-check each URL works from incognito
	- Confirm submission

	### 4:30–5:00 PM — BUFFER (do not skip)

	- Re-verify submission record
	- Take screenshots of submission confirmation
	- Final coffee. Breathe.

	### 5:00 PM — DEADLINE 🚨

	No more changes accepted under any circumstances.

	### 5:00 PM onward — Closing & networking

	## Team Split — Bhole Chature

	\| Phase \| Anurag \| Kanan \|
	\|-------\|--------\|-------\|
	\| 1 \| scaffold review \| scaffold review \|
	\| 2 \| core env (clarify_environment.py) \| scenarios.py + user_simulator.py \|
	\| 3 \| rubrics.py + grader.py \| unit tests for env + rubric \|
	\| 4 \| HF Space deploy \| inference.py + baseline eval \|
	\| 5 \| bug-fix env on issues \| train_grpo.ipynb (Colab) \|
	\| 6 \| monitor first training run \| inspect outputs, sample logs \|
	\| 7 \| iterate on env/rubric \| iterate on training config \|
	\| 8 \| overnight checkpoint mgmt \| sleep shift A \|
	\| 9 \| sleep shift B \| final eval + plots \|
	\| 10 \| README + repo polish \| demo video + blog.md \|
	\| 11 \| architecture writeup \| submission link audit \|
	\| 12 \| final validation sweep \| final validation sweep \|
	\| 13 \| hits submit \| screenshot + confirm \|

	Both pair on critical risk items: training debug, validation sweep.

	## Anti-bus-factor Rule

	Both team members must know how to:
	- Push to HF Space + GitHub
	- Restart the Colab runtime
	- Run the env locally
	- Run the eval script

	If one is stuck/AFK at submission time, the other should be able to ship.

	## Energy Management

	- Snack/coffee every 2 hours
	- 5-min walks each hour
	- No alcohol/Red Bull spirals
	- Hydration > caffeine