README: add model card highlights section and metadata snapshot. 279d788 Running md896 commited on 10 days ago
Demo copy: drop judges wording; workflow map at a glance for readers. 270cdf0 md896 commited on 10 days ago
Expand problem narrative and Engineering Notes: time-on-SQL, Spider vs prod. 40caa50 md896 commited on 10 days ago
Add GitHub repo link; drop Karpathy, blog, slides, demo placeholders. d00292e md896 commited on 10 days ago
HTML demo: fix diagram crop; click-to-zoom lightbox with pan and +/- zoom. e6d1a8f md896 commited on 10 days ago
HTML /demo: add Benchmark visuals table and three chart figures. f5c939b md896 commited on 10 days ago
Space home: redirect / to HTML /demo; Gradio at /gradio; fix Gradio hero. f4ae3f3 md896 commited on 10 days ago
Fix Space pip resolve: fastapi>=0.115.2, python-multipart>=0.0.18, pin gradio 5.50.0. d9b6c59 md896 commited on 10 days ago
Pin Gradio >=5.7.1 for huggingface_hub 1.x (fixes HfFolder ImportError on Space). f7153ad md896 commited on 10 days ago
Point /demo and Gradio at diagram-end-to-end-workflow.png (asset on Hub via Xet). 4c3e70f md896 commited on 10 days ago
Ship polished Space UI with Gradio dashboard and evidence-rich demo. 029f9cf md896 commited on 10 days ago
Upload artifacts/runs/20260426-064318-sample-rewards-32eval/sample_rewards_final.json with huggingface_hub 4724001 verified md896 commited on 10 days ago
Fix TRL 0.18 compatibility: remove unsupported generation_kwargs; set safety flags on model.generation_config. 6083a40 md896 commited on 11 days ago
Harden GRPO generation stability on CUDA: bf16 + eager attention + invalid-logit guards. 948530a md896 commited on 11 days ago
Fix GRPO batch/generation mismatch: auto-adjust num_generations; set launcher default to 2. af54ccd md896 commited on 11 days ago
Simplify HF training stack: remove unsloth/vllm path, use plain transformers AutoModel + single OpenEnv reward. e5262a1 md896 commited on 11 days ago
Fix Unsloth startup: avoid pre-importing trl/transformers; mock vllm as real package modules. d21de11 md896 commited on 11 days ago
Fix HF job startup: import unsloth first and shim vllm package metadata check. 1fdba13 md896 commited on 11 days ago
Fix HF Job bootstrap: transformers>=4.51 for trl 0.18, datasets<4; simplify to colab-style OpenEnv SQL reward. ee30276 md896 commited on 11 days ago
Fix HF Jobs bootstrap (pin transformers/trl, drop torchao stack); add reward and trainer JSONL logging; stabilize launch_job. ceee0e3 md896 commited on 11 days ago
Fix: Mock vllm and llm_blender to stabilize GRPOTrainer in HF Jobs environment bc20ef9 md896 commited on 11 days ago
Downgrade TRL to 0.22.2 to natively bypass experimental vllm dependencies 2eb9add md896 commited on 11 days ago
Fix vllm error cleanly by creating fake python module structure b2ce6c6 md896 commited on 11 days ago