codesensei-env / training

Commit History

revert(training): temporarily revert dataset bounds back to 10 for complete trial confirmation before scaling
6fd478f

vineetshukla.work@gmail.com commited on

feat(training): scale dataset to 500 prompts and add hub export cell for full hackathon training run
8b03007

vineetshukla.work@gmail.com commited on

fix(trl): inject warnings_issued dict into unsloth model to prevent TRL GRPO init crash on PeftModels
0413c85

vineetshukla.work@gmail.com commited on

fix(trl): force hardware-specific fp16/bf16 flags to prevent TRL from crashing on T4 GPU
6262683

vineetshukla.work@gmail.com commited on

fix(trl): mock weave.trace to complete package bypass
6f6568f

vineetshukla.work@gmail.com commited on

fix(trl): add weave to the magic mock bypass list for cell 5
7ea8558

vineetshukla.work@gmail.com commited on

fix(trl): inject magicmock bypass into cell 5 to squash TRL 0.24.0 import crash
e0923c4

vineetshukla.work@gmail.com commited on

feat: complete rewrite of grpo training script using Unsloth optimized pathway
aa86916

vineetshukla.work@gmail.com commited on

chore: test trial run for colab script with 10 dataset size and fixed dependencies
a59d79f

vineetshukla.work@gmail.com commited on

fix: downgrade TRL to 0.15.0 and nuke ghost vLLM install to prevent import crashes
b119c82

vineetshukla.work@gmail.com commited on

fix: correct trl version pin to 0.17.0
a46eca2

vineetshukla.work@gmail.com commited on

fix: vLLM/TRL version conflict in Colab Cell 1
0b9cc90

vineetshukla.work@gmail.com commited on

fix: training reward bounds to (0.01, 0.99) — prevents loss=0 from zero variance
545acf0

vineetshukla.work@gmail.com commited on

fix: increase num_generations=6, temperature=0.9, lr=2e-5 to fix zero-loss GRPO
b3bf487

vineetshukla.work@gmail.com commited on

fix: remove fp16=True, use paged_adamw_8bit for QLoRA BFloat16 compatibility
038f1b4

vineetshukla.work@gmail.com commited on

fix: add 4-bit quantization + LoRA to fit GRPO training in T4 15GB VRAM
90d6d56

vineetshukla.work@gmail.com commited on

refactor: rewrite training script - remove rollout_func, use inline reward evaluation
d81cba7

vineetshukla.work@gmail.com commited on

fix: remove deprecated GRPOConfig params for latest TRL compatibility
0589e5e

vineetshukla.work@gmail.com commited on

fix: set live HF Space URL in training script and demo app
d12dd6e

vineetshukla.work@gmail.com commited on

feat: CodeSensei - GRPO-trained LLM code debugger on OpenEnv
c47c81c

vineetshukla.work@gmail.com commited on