Commit History

Upload README.md
15c5622
verified

Veer15 commited on

Update blog.md
e415910
verified

Veer15 commited on

Upload blog.md
3bac076
verified

Veer15 commited on

docs: correct HF endpoint deployment command
007ac94

Viraj commited on

docs: add quick HF CLI endpoint deployment steps
5648aaa

Viraj commited on

publish: mark merged models as transformers text-generation artifacts
8026921

Viraj commited on

add provider interface
1d1a656

Navaneeth Sharma commited on

publish: inline chat template into tokenizer_config for HF chat compatibility
aad1c03

Viraj commited on

Add w and b report link
6eef1a2

Viraj commited on

docs: replace in-progress training notes with final run results
f498a76

Viraj commited on

remove the redundant section in the dashboard
42b0d04

Navaneeth Sharma commited on

Add qwen3-8b red vs blue showdown output
d7e53de

Navaneeth Sharma commited on

docs: refine wording in README for clarity and flow
eb06422

Viraj commited on

docs: humanize README origin line
91b3597

Viraj commited on

docs: remove Fibr mention and correct Round 1 origin story
685ca27

Viraj commited on

docs: remove misplaced tagline from README intro
c5e188d

Viraj commited on

docs: replace Dario reference with direct security-intuition thesis
7218ec9

Viraj commited on

docs: add a few plan quips without turning README into a joke
bad482c

Viraj commited on

docs: tighten README back to direct no-nonsense style
84c8e45

Viraj commited on

docs: rewrite README narrative to feel human, fix Blue curriculum framing
94acb21

Viraj commited on

docs: restore Mythos as Claude's model name in origin story
8d7e54f

Viraj commited on

docs: clarify dual-role env design, name mesh components explicitly, fix Claude typo
85b82f3

Viraj commited on

docs: clarify Dario Amodei reference and correct model name to Claude
0fae122

Viraj commited on

docs: replace WarGames README with Faultline story-first README
81a4873

Viraj commited on

docs: add training.md operator quickstart, remove superpowers planning artifacts
303e4f7

Viraj commited on

config: max_steps 100 -> 60 to fit budget at empirical 150s/step
8f20ede

Viraj commited on

env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages
839b00f

Viraj commited on

action_parser: strict variant strips Qwen3 <think>...</think> before json.loads
4c2ea83

Viraj commited on

trainer: A+B+C+D — soften think-mask, strict parser, add format reward, max_steps 100
c7f94bc

Viraj commited on

smoke: drop vllm_gpu_memory_utilization 0.50->0.35 (a10g headroom for 1024 ctx)
287fee2

Viraj commited on

rollout: mask think block from GRPO loss + free GPU before merge subprocess
1fcb4eb

Viraj commited on

rollout: disable Qwen3 thinking mode in chat template (clipped_ratio=0.75 -> expected drop)
baa10af

Viraj commited on

publish: in-job adapter+merged push via subprocess; upload_large_folder; env overrides
ccf4d77

Viraj commited on

publish: auto push adapter (and optional merged) to HF Hub after training
92fd4b1

Viraj commited on

tsconfig fix
fd0cd9e

Navaneeth Sharma commited on

fix the tsc issue
83423e6

Navaneeth Sharma commited on

fix cf pages issue of the tsc, use npx instead
f9d74fe

Navaneeth Sharma commited on

Update tsconfig.tsbuildinfo
1775662

Navaneeth Sharma commited on

jobs: wait for CUDA init before trainer; trainer: add per-component reward funcs
25e6877

Viraj commited on

base: max_steps 200 -> 150 for budget headroom
1b21009

Viraj commited on

base: tighten for 5h budget — max_steps=200, completion=768, ep_steps=4
46fdd5f

Viraj commited on

smoke: grad_accum=4 so generation_batch divisible by num_generations
ca58fc1

Viraj commited on

smoke: resize for a10g — Qwen3-1.7B, max_completion=512, ep_steps=3
02f02af

Viraj commited on

smoke: mirror base config shape with Qwen3-1.7B, 5 steps
6a19c22

Viraj commited on

base: max_completion_length=1024, max_seq=10240, ep_steps=5, gc=true
a34d8be

Viraj commited on

Add tiny smoke yaml: Qwen3-0.6B on a10g
1de5c98

Viraj commited on

smoke: cap vllm_max_model_length=8192, util=0.55, gc=true; passthrough both
5ab5547

Viraj commited on

smoke: enable gradient_checkpointing, lower vllm gpu util to 0.30, ep_steps=2
157117a

Viraj commited on

Flatten rollout logprobs to 2D list[list[float]] for TRL sampling_per_token_logps
7d7467a

Viraj commited on

Fix GRPO rollout: vLLM multi-turn sampling, env_mask, peft_config to TRL, max_completion_length=500
c3c84b0

Viraj commited on