Commit History

Fix TRL 0.18 compatibility: remove unsupported generation_kwargs; set safety flags on model.generation_config.
6083a40

md896 commited on

Harden GRPO generation stability on CUDA: bf16 + eager attention + invalid-logit guards.
948530a

md896 commited on

Fix GRPO batch/generation mismatch: auto-adjust num_generations; set launcher default to 2.
af54ccd

md896 commited on

Simplify HF training stack: remove unsloth/vllm path, use plain transformers AutoModel + single OpenEnv reward.
e5262a1

md896 commited on

Fix Unsloth startup: avoid pre-importing trl/transformers; mock vllm as real package modules.
d21de11

md896 commited on

Fix HF job startup: import unsloth first and shim vllm package metadata check.
1fdba13

md896 commited on

Fix HF Job bootstrap: transformers>=4.51 for trl 0.18, datasets<4; simplify to colab-style OpenEnv SQL reward.
ee30276

md896 commited on

Fix HF Jobs bootstrap (pin transformers/trl, drop torchao stack); add reward and trainer JSONL logging; stabilize launch_job.
ceee0e3

md896 commited on

changes in ultimate sota
ac3911c

md896 commited on

Fix: Mock vllm and llm_blender to stabilize GRPOTrainer in HF Jobs environment
bc20ef9

md896 commited on

Downgrade TRL to 0.22.2 to natively bypass experimental vllm dependencies
2eb9add

md896 commited on

Fix vllm error cleanly by creating fake python module structure
b2ce6c6

md896 commited on

Add vllm to dependencies to fix TRL's hard import requirement
711ae38

md896 commited on

Remove vllm mock to fix importlib find_spec crash in TRL 0.23
97cddc4

md896 commited on

Mock vllm to bypass TRL missing module error
1bc1daa

md896 commited on

Downgrade TRL to <0.24.0 to fix missing dependency chain
397face

md896 commited on

Add TRANSFORMERS_CACHE mock to fix TRL/llm_blender crash
16dd181

md896 commited on

Fix llm-blender ModuleNotFoundError
1cd9ac8

md896 commited on

Ensure TRL GRPO imports by installing mergekit
6d0b5c3

md896 commited on

Prevent torchvision import crashes in HF Jobs
8b3c03a

md896 commited on

Make OpenEnv training+API judge-proof
d061422

md896 commited on

Add --break-system-packages for Ubuntu 24.04 (PEP-668)
830c039

md896 commited on

Fix TRL and Torchao dependency conflicts
d118f9f

md896 commited on

Deploy: SOTA RL Cartesian Task and Unsloth Scripts
6518b31

md896 commited on

Harden strict (0,1) scoring boundaries across runtime and config.
9b71d1b

md896 commited on

Initialize runtime score fields with strict non-boundary minimum
8e7c622

md896 commited on

Keep task scores strictly inside (0,1) in inference logs
941f5f8

md896 commited on

Enforce strict (0,1) task score outputs for validators
bc9f459

md896 commited on

Restore full README content under HF metadata
87464f9

md896 commited on

Align inference env var handling with submission checklist.
e7c61ad

md896 commited on

initial commit
74e3e43

md896 commited on

Expand README with lifecycle and validation details
da4c99f

md896 commited on

Polish README hero and add quick links
bad2f97

md896 commited on

initial commit
03d6772

md896 commited on

Add HF Space metadata frontmatter
157ae69

md896 commited on

Initial OpenEnv SQL debug environment
35a5127

md896 commited on

initial commit
c193516

md896 commited on

Initial OpenEnv SQL debug environment
00849df

md896 commited on

Expand README with lifecycle and validation details
18e112b

md896 commited on

initial commit
c7d8ccb

md896 commited on

Polish README hero and add quick links
06729fc

md896 commited on

initial commit
959117b

md896 commited on

Add HF Space metadata frontmatter
5fb3e9e

md896 commited on

Initial OpenEnv SQL debug environment
65def40

md896 commited on

initial commit
7c1365c

md896 commited on

Initial OpenEnv SQL debug environment
765dffd

md896 commited on

Polish README hero and add quick links
070f20e

md896 commited on

initial commit
23c3767

md896 commited on

Add HF Space metadata frontmatter
d7e90b4

md896 commited on

Initial OpenEnv SQL debug environment
9fafc85

md896 commited on