Commit History

Hackathon submission: new README (3-5 min read), BLOG.md narrative, frontier baselines, design-principles framing
40de84e
verified

yashash045 commited on

Submission deliverables: Colab notebook + per-step CSV + trainer log upload
cc203b9

yashash04 commited on

Phase M Stage B: align fully with kube-sre-gym (sid-rp) patterns + A100 prep
9f7a631

yashash04 commited on

Phase M Option 3 v3: 3 fixes lifted from kube-sre-gym (sid-rp) GRPO patterns
a598d78

yashash04 commited on

Phase M Option 3 retry: vLLM debug logging + reduce gpu_memory + max-steps 30
f2a191c

yashash04 commited on

Phase M Option 3 hotfix: re-add vllm to PEP 723 deps
7a266c9

yashash04 commited on

Phase M Option 3: vLLM single LoRA, no SFT adapter
fcd962c

yashash04 commited on

Phase M Stage B: Option A coerce dict->list + bump to 200 steps
aed29b5

yashash04 commited on

Phase M smoke test wrapper: 20 steps for fast validation
4600f3c

yashash04 commited on

Phase M Stage A v2: fix 3 compound bugs preventing GRPO learning
c6c5f7f

yashash04 commited on

Phase M Stage A: revert wrapper + trainer to 4940a15 (last known-working non-vLLM state)
bf1c711

yashash04 commited on

Phase M Stage A: revert --use-vllm, ship non-vLLM path
1a604b3

yashash04 commited on

Phase M debug Approach 2.5: pin peft>=0.18.0,<0.19 for SFT adapter loading
8d77eb7

yashash04 commited on

Phase M debug Approach 2: revert vllm pin + enforce_eager=True
c701cb1

yashash04 commited on

Phase M Stage A vLLM Approach 1: pin vllm==0.6.3 (avoid v1-engine graph bug)
2599fd4

yashash04 commited on

Phase M Stage A vLLM: enable use_vllm + fast_inference for ~3x speedup
22623bb

yashash04 commited on

Phase M Stage A: --max-steps 100, fix Track-IO wiring
4940a15

yashash04 commited on

Phase M-prep fix: append /final to sft_path in GRPO wrapper
3cf8472

yashash04 commited on

Phase M-prep fix: drop --system from uv pip install (UV-venv vs system-Python mismatch)
cffbfce

yashash04 commited on

Phase M-prep fix: capture env-server logs + extend boot timeout in GRPO wrapper
85b09ec

yashash04 commited on

Phase M-prep fix: align wrapper script args with actual training script CLIs
a43e8f7

yashash04 commited on

Phase M-prep fix: use 'uv pip install --system' in wrapper scripts
25568de

yashash04 commited on

Phase M-prep: HF Jobs UV wrapper scripts for L4 training (Option A)
35da88a

yashash04 commited on

Phase X1-X3: strip handoff_notes from SFT data + add schema guardrail
a29cdf3

yashash04 commited on