Spaces:
Runtime error
Runtime error
Commit History
Fix eval decode prompt length slicing fd3e88c
Fix GRPO reward mapping and evaluation generation b2874f4
Include prompt column for GRPOTrainer 4fb3c37
Use finite datasets.Dataset for GRPOTrainer compatibility 3476016
Yield tensors from GRPO dataset generator e817e69
Fix GRPO dataloader batching on Kaggle 8568d9f
Fix GRPO sample id mapping and Kaggle training setup 2066092
Fix division by zero in final avg improvement calculation 022be04
Fix category enum: use lowercase for ProbeAction categories b91005f
Fix eval_report.py: use step() return value and extract reward from observation b3ab235
Remove unsupported data_collator argument from GRPOTrainer 97ddd73
Add data collator to GRPOTrainer for on-the-fly prompt tokenization f2d68bf
Revert to raw prompt format—let GRPOTrainer handle tokenization e49866a
Convert tokenized outputs to torch tensors for proper batching 9ba7512
Tokenize prompts in dataset generator for GRPOTrainer compatibility 75be7bc
Fix prompt format: convert from chat list to string for GRPOTrainer compatibility 77793ce
Fix dataset format: only yield 'prompt' field to avoid tensor concatenation errors 80c91ac
Remove tokenizer argument from GRPOTrainer—not supported in installed TRL version 4c16e6a
Fix path bootstrap in train_grpo.py: use parent.parent to reach project root bc2ac25
Fix GRPO batch size config for Kaggle P100: batch_size=4, grad_accum=2 (global=8, divisible by num_generations=2) feefe4a
Add eval_report.py for before/after training comparison 4e029fe
Fix JSON parsing and environment bugs c22ceaa
Code improvenets fa66cd4
Thakur, Mahipal commited on
refactor: remove legacy architecture, promote clean structure to repo root 85fab7b
Thakur, Mahipal commited on