Spaces:

Veer15
/

faultline-env-train

Sleeping

App Files Files Community

faultline-env-train

Commit History

Upload README.md

15c5622
verified

Veer15 commited on Apr 26

Update blog.md

e415910
verified

Veer15 commited on Apr 26

Upload blog.md

3bac076
verified

Veer15 commited on Apr 26

docs: correct HF endpoint deployment command

007ac94

Viraj commited on Apr 26

docs: add quick HF CLI endpoint deployment steps

5648aaa

Viraj commited on Apr 26

publish: mark merged models as transformers text-generation artifacts

8026921

Viraj commited on Apr 26

add provider interface

1d1a656

Navaneeth Sharma commited on Apr 26

publish: inline chat template into tokenizer_config for HF chat compatibility

aad1c03

Viraj commited on Apr 26

Add w and b report link

6eef1a2

Viraj commited on Apr 26

docs: replace in-progress training notes with final run results

f498a76

Viraj commited on Apr 26

remove the redundant section in the dashboard

42b0d04

Navaneeth Sharma commited on Apr 26

Add qwen3-8b red vs blue showdown output

d7e53de

Navaneeth Sharma commited on Apr 26

docs: refine wording in README for clarity and flow

eb06422

Viraj commited on Apr 26

docs: humanize README origin line

91b3597

Viraj commited on Apr 26

docs: remove Fibr mention and correct Round 1 origin story

685ca27

Viraj commited on Apr 26

docs: remove misplaced tagline from README intro

c5e188d

Viraj commited on Apr 26

docs: replace Dario reference with direct security-intuition thesis

7218ec9

Viraj commited on Apr 26

docs: add a few plan quips without turning README into a joke

bad482c

Viraj commited on Apr 26

docs: tighten README back to direct no-nonsense style

84c8e45

Viraj commited on Apr 26

docs: rewrite README narrative to feel human, fix Blue curriculum framing

94acb21

Viraj commited on Apr 26

docs: restore Mythos as Claude's model name in origin story

8d7e54f

Viraj commited on Apr 26

docs: clarify dual-role env design, name mesh components explicitly, fix Claude typo

85b82f3

Viraj commited on Apr 26

docs: clarify Dario Amodei reference and correct model name to Claude

0fae122

Viraj commited on Apr 26

docs: replace WarGames README with Faultline story-first README

81a4873

Viraj commited on Apr 26

docs: add training.md operator quickstart, remove superpowers planning artifacts

303e4f7

Viraj commited on Apr 26

config: max_steps 100 -> 60 to fit budget at empirical 150s/step

8f20ede

Viraj commited on Apr 26

env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages

839b00f

Viraj commited on Apr 26

action_parser: strict variant strips Qwen3 <think>...</think> before json.loads

4c2ea83

Viraj commited on Apr 26

trainer: A+B+C+D — soften think-mask, strict parser, add format reward, max_steps 100

c7f94bc

Viraj commited on Apr 26

smoke: drop vllm_gpu_memory_utilization 0.50->0.35 (a10g headroom for 1024 ctx)

287fee2

Viraj commited on Apr 25

rollout: mask think block from GRPO loss + free GPU before merge subprocess

1fcb4eb

Viraj commited on Apr 25

rollout: disable Qwen3 thinking mode in chat template (clipped_ratio=0.75 -> expected drop)

baa10af

Viraj commited on Apr 25

publish: in-job adapter+merged push via subprocess; upload_large_folder; env overrides

ccf4d77

Viraj commited on Apr 25

publish: auto push adapter (and optional merged) to HF Hub after training

92fd4b1

Viraj commited on Apr 25

tsconfig fix

fd0cd9e

Navaneeth Sharma commited on Apr 25

fix the tsc issue

83423e6

Navaneeth Sharma commited on Apr 25

fix cf pages issue of the tsc, use npx instead

f9d74fe

Navaneeth Sharma commited on Apr 25

Update tsconfig.tsbuildinfo

1775662

Navaneeth Sharma commited on Apr 25

jobs: wait for CUDA init before trainer; trainer: add per-component reward funcs

25e6877

Viraj commited on Apr 25

base: max_steps 200 -> 150 for budget headroom

1b21009

Viraj commited on Apr 25

base: tighten for 5h budget — max_steps=200, completion=768, ep_steps=4

46fdd5f

Viraj commited on Apr 25

smoke: grad_accum=4 so generation_batch divisible by num_generations

ca58fc1

Viraj commited on Apr 25

smoke: resize for a10g — Qwen3-1.7B, max_completion=512, ep_steps=3

02f02af

Viraj commited on Apr 25

smoke: mirror base config shape with Qwen3-1.7B, 5 steps

6a19c22

Viraj commited on Apr 25

base: max_completion_length=1024, max_seq=10240, ep_steps=5, gc=true

a34d8be

Viraj commited on Apr 25

Add tiny smoke yaml: Qwen3-0.6B on a10g

1de5c98

Viraj commited on Apr 25

smoke: cap vllm_max_model_length=8192, util=0.55, gc=true; passthrough both

5ab5547

Viraj commited on Apr 25

smoke: enable gradient_checkpointing, lower vllm gpu util to 0.30, ep_steps=2

157117a

Viraj commited on Apr 25

Flatten rollout logprobs to 2D list[list[float]] for TRL sampling_per_token_logps

7d7467a

Viraj commited on Apr 25

Fix GRPO rollout: vLLM multi-turn sampling, env_mask, peft_config to TRL, max_completion_length=500

c3c84b0

Viraj commited on Apr 25

Commit History

Upload README.md 15c5622 verified

Update blog.md e415910 verified

Upload blog.md 3bac076 verified

docs: correct HF endpoint deployment command 007ac94

docs: add quick HF CLI endpoint deployment steps 5648aaa

publish: mark merged models as transformers text-generation artifacts 8026921

add provider interface 1d1a656

publish: inline chat template into tokenizer_config for HF chat compatibility aad1c03

Add w and b report link 6eef1a2

docs: replace in-progress training notes with final run results f498a76

remove the redundant section in the dashboard 42b0d04

Add qwen3-8b red vs blue showdown output d7e53de

docs: refine wording in README for clarity and flow eb06422

docs: humanize README origin line 91b3597

docs: remove Fibr mention and correct Round 1 origin story 685ca27

docs: remove misplaced tagline from README intro c5e188d

docs: replace Dario reference with direct security-intuition thesis 7218ec9

docs: add a few plan quips without turning README into a joke bad482c

docs: tighten README back to direct no-nonsense style 84c8e45

docs: rewrite README narrative to feel human, fix Blue curriculum framing 94acb21

docs: restore Mythos as Claude's model name in origin story 8d7e54f

docs: clarify dual-role env design, name mesh components explicitly, fix Claude typo 85b82f3

docs: clarify Dario Amodei reference and correct model name to Claude 0fae122

docs: replace WarGames README with Faultline story-first README 81a4873

docs: add training.md operator quickstart, remove superpowers planning artifacts 303e4f7

config: max_steps 100 -> 60 to fit budget at empirical 150s/step 8f20ede

env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages 839b00f

action_parser: strict variant strips Qwen3 <think>...</think> before json.loads 4c2ea83

trainer: A+B+C+D — soften think-mask, strict parser, add format reward, max_steps 100 c7f94bc

smoke: drop vllm_gpu_memory_utilization 0.50->0.35 (a10g headroom for 1024 ctx) 287fee2

rollout: mask think block from GRPO loss + free GPU before merge subprocess 1fcb4eb

rollout: disable Qwen3 thinking mode in chat template (clipped_ratio=0.75 -> expected drop) baa10af

publish: in-job adapter+merged push via subprocess; upload_large_folder; env overrides ccf4d77

publish: auto push adapter (and optional merged) to HF Hub after training 92fd4b1

tsconfig fix fd0cd9e

fix the tsc issue 83423e6

fix cf pages issue of the tsc, use npx instead f9d74fe

Update tsconfig.tsbuildinfo 1775662

jobs: wait for CUDA init before trainer; trainer: add per-component reward funcs 25e6877

base: max_steps 200 -> 150 for budget headroom 1b21009

base: tighten for 5h budget — max_steps=200, completion=768, ep_steps=4 46fdd5f

smoke: grad_accum=4 so generation_batch divisible by num_generations ca58fc1

smoke: resize for a10g — Qwen3-1.7B, max_completion=512, ep_steps=3 02f02af

smoke: mirror base config shape with Qwen3-1.7B, 5 steps 6a19c22

base: max_completion_length=1024, max_seq=10240, ep_steps=5, gc=true a34d8be

Add tiny smoke yaml: Qwen3-0.6B on a10g 1de5c98

smoke: cap vllm_max_model_length=8192, util=0.55, gc=true; passthrough both 5ab5547

smoke: enable gradient_checkpointing, lower vllm gpu util to 0.30, ep_steps=2 157117a

Flatten rollout logprobs to 2D list[list[float]] for TRL sampling_per_token_logps 7d7467a

Fix GRPO rollout: vLLM multi-turn sampling, env_mask, peft_config to TRL, max_completion_length=500 c3c84b0

Upload README.md

15c5622
verified

Update blog.md

e415910
verified

Upload blog.md

3bac076
verified

docs: correct HF endpoint deployment command

007ac94

docs: add quick HF CLI endpoint deployment steps

5648aaa

publish: mark merged models as transformers text-generation artifacts

8026921

add provider interface

1d1a656

publish: inline chat template into tokenizer_config for HF chat compatibility

aad1c03

Add w and b report link

6eef1a2

docs: replace in-progress training notes with final run results

f498a76

remove the redundant section in the dashboard

42b0d04

Add qwen3-8b red vs blue showdown output

d7e53de

docs: refine wording in README for clarity and flow

eb06422

docs: humanize README origin line

91b3597

docs: remove Fibr mention and correct Round 1 origin story

685ca27

docs: remove misplaced tagline from README intro

c5e188d

docs: replace Dario reference with direct security-intuition thesis

7218ec9

docs: add a few plan quips without turning README into a joke

bad482c

docs: tighten README back to direct no-nonsense style

84c8e45

docs: rewrite README narrative to feel human, fix Blue curriculum framing

94acb21

docs: restore Mythos as Claude's model name in origin story

8d7e54f

docs: clarify dual-role env design, name mesh components explicitly, fix Claude typo

85b82f3

docs: clarify Dario Amodei reference and correct model name to Claude

0fae122

docs: replace WarGames README with Faultline story-first README

81a4873

docs: add training.md operator quickstart, remove superpowers planning artifacts

303e4f7

config: max_steps 100 -> 60 to fit budget at empirical 150s/step

8f20ede

env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages

839b00f

action_parser: strict variant strips Qwen3 <think>...</think> before json.loads

4c2ea83

trainer: A+B+C+D — soften think-mask, strict parser, add format reward, max_steps 100

c7f94bc

smoke: drop vllm_gpu_memory_utilization 0.50->0.35 (a10g headroom for 1024 ctx)

287fee2

rollout: mask think block from GRPO loss + free GPU before merge subprocess

1fcb4eb

rollout: disable Qwen3 thinking mode in chat template (clipped_ratio=0.75 -> expected drop)

baa10af

publish: in-job adapter+merged push via subprocess; upload_large_folder; env overrides

ccf4d77

publish: auto push adapter (and optional merged) to HF Hub after training

92fd4b1

tsconfig fix

fd0cd9e

fix the tsc issue

83423e6

fix cf pages issue of the tsc, use npx instead

f9d74fe

Update tsconfig.tsbuildinfo

1775662

jobs: wait for CUDA init before trainer; trainer: add per-component reward funcs

25e6877

base: max_steps 200 -> 150 for budget headroom

1b21009

base: tighten for 5h budget — max_steps=200, completion=768, ep_steps=4

46fdd5f

smoke: grad_accum=4 so generation_batch divisible by num_generations

ca58fc1

smoke: resize for a10g — Qwen3-1.7B, max_completion=512, ep_steps=3

02f02af

smoke: mirror base config shape with Qwen3-1.7B, 5 steps

6a19c22

base: max_completion_length=1024, max_seq=10240, ep_steps=5, gc=true

a34d8be

Add tiny smoke yaml: Qwen3-0.6B on a10g

1de5c98

smoke: cap vllm_max_model_length=8192, util=0.55, gc=true; passthrough both

5ab5547

smoke: enable gradient_checkpointing, lower vllm gpu util to 0.30, ep_steps=2

157117a

Flatten rollout logprobs to 2D list[list[float]] for TRL sampling_per_token_logps

7d7467a

Fix GRPO rollout: vLLM multi-turn sampling, env_mask, peft_config to TRL, max_completion_length=500

c3c84b0