Spaces:
Sleeping
Sleeping
Commit History
Update blog.md e415910 verified
Upload blog.md 3bac076 verified
docs: correct HF endpoint deployment command 007ac94
Viraj commited on
docs: add quick HF CLI endpoint deployment steps 5648aaa
Viraj commited on
publish: mark merged models as transformers text-generation artifacts 8026921
Viraj commited on
add provider interface 1d1a656
Navaneeth Sharma commited on
publish: inline chat template into tokenizer_config for HF chat compatibility aad1c03
Viraj commited on
Add w and b report link 6eef1a2
Viraj commited on
docs: replace in-progress training notes with final run results f498a76
Viraj commited on
remove the redundant section in the dashboard 42b0d04
Navaneeth Sharma commited on
Add qwen3-8b red vs blue showdown output d7e53de
Navaneeth Sharma commited on
docs: refine wording in README for clarity and flow eb06422
Viraj commited on
docs: humanize README origin line 91b3597
Viraj commited on
docs: remove Fibr mention and correct Round 1 origin story 685ca27
Viraj commited on
docs: remove misplaced tagline from README intro c5e188d
Viraj commited on
docs: replace Dario reference with direct security-intuition thesis 7218ec9
Viraj commited on
docs: add a few plan quips without turning README into a joke bad482c
Viraj commited on
docs: tighten README back to direct no-nonsense style 84c8e45
Viraj commited on
docs: rewrite README narrative to feel human, fix Blue curriculum framing 94acb21
Viraj commited on
docs: restore Mythos as Claude's model name in origin story 8d7e54f
Viraj commited on
docs: clarify dual-role env design, name mesh components explicitly, fix Claude typo 85b82f3
Viraj commited on
docs: clarify Dario Amodei reference and correct model name to Claude 0fae122
Viraj commited on
docs: replace WarGames README with Faultline story-first README 81a4873
Viraj commited on
docs: add training.md operator quickstart, remove superpowers planning artifacts 303e4f7
Viraj commited on
config: max_steps 100 -> 60 to fit budget at empirical 150s/step 8f20ede
Viraj commited on
env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages 839b00f
Viraj commited on
action_parser: strict variant strips Qwen3 <think>...</think> before json.loads 4c2ea83
Viraj commited on
trainer: A+B+C+D — soften think-mask, strict parser, add format reward, max_steps 100 c7f94bc
Viraj commited on
smoke: drop vllm_gpu_memory_utilization 0.50->0.35 (a10g headroom for 1024 ctx) 287fee2
Viraj commited on
rollout: mask think block from GRPO loss + free GPU before merge subprocess 1fcb4eb
Viraj commited on
rollout: disable Qwen3 thinking mode in chat template (clipped_ratio=0.75 -> expected drop) baa10af
Viraj commited on
publish: in-job adapter+merged push via subprocess; upload_large_folder; env overrides ccf4d77
Viraj commited on
publish: auto push adapter (and optional merged) to HF Hub after training 92fd4b1
Viraj commited on
tsconfig fix fd0cd9e
Navaneeth Sharma commited on
fix the tsc issue 83423e6
Navaneeth Sharma commited on
fix cf pages issue of the tsc, use npx instead f9d74fe
Navaneeth Sharma commited on
Update tsconfig.tsbuildinfo 1775662
Navaneeth Sharma commited on
jobs: wait for CUDA init before trainer; trainer: add per-component reward funcs 25e6877
Viraj commited on
base: max_steps 200 -> 150 for budget headroom 1b21009
Viraj commited on
base: tighten for 5h budget — max_steps=200, completion=768, ep_steps=4 46fdd5f
Viraj commited on
smoke: grad_accum=4 so generation_batch divisible by num_generations ca58fc1
Viraj commited on
smoke: resize for a10g — Qwen3-1.7B, max_completion=512, ep_steps=3 02f02af
Viraj commited on
smoke: mirror base config shape with Qwen3-1.7B, 5 steps 6a19c22
Viraj commited on
base: max_completion_length=1024, max_seq=10240, ep_steps=5, gc=true a34d8be
Viraj commited on
Add tiny smoke yaml: Qwen3-0.6B on a10g 1de5c98
Viraj commited on
smoke: cap vllm_max_model_length=8192, util=0.55, gc=true; passthrough both 5ab5547
Viraj commited on
smoke: enable gradient_checkpointing, lower vllm gpu util to 0.30, ep_steps=2 157117a
Viraj commited on
Flatten rollout logprobs to 2D list[list[float]] for TRL sampling_per_token_logps 7d7467a
Viraj commited on
Fix GRPO rollout: vLLM multi-turn sampling, env_mask, peft_config to TRL, max_completion_length=500 c3c84b0
Viraj commited on