Spaces:

Veer15
/

faultline-env-train

Sleeping

App Files Files Community

faultline-env-train / training

78.3 kB

Ctrl+K

Ctrl+K

3 contributors

History: 44 commits

Viraj

docs: correct HF endpoint deployment command

007ac94 3 months ago

artifacts
Add episode replay dashboard and GRPO training pipeline 3 months ago
config
config: max_steps 100 -> 60 to fit budget at empirical 150s/step 3 months ago
env_adapter
env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages 3 months ago
grpo
env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages 3 months ago
jobs
trainer: A+B+C+D — soften think-mask, strict parser, add format reward, max_steps 100 3 months ago
notebooks
Add episode replay dashboard and GRPO training pipeline 3 months ago
prompts
Add episode replay dashboard and GRPO training pipeline 3 months ago
publish
publish: mark merged models as transformers text-generation artifacts 3 months ago
rollouts
Fix training pipeline: TRL>=0.25 rollout, real generation, curriculum/W&B callbacks 3 months ago
spaces
docs: correct HF endpoint deployment command 3 months ago
README.md

10.4 kB
Switch base model to Qwen/Qwen3-8B (Qwen3.5-9B is multimodal Qwen3_5ForConditionalGeneration, unsupported by unsloth) 3 months ago
__init__.py

0 Bytes
Add episode replay dashboard and GRPO training pipeline 3 months ago