PRobe / training

Commit History

Updated readme
02fa6e2

mahithakur commited on

Fix eval decode prompt length slicing
fd3e88c

mahithakur commited on

Fix GRPO reward mapping and evaluation generation
b2874f4

mahithakur commited on

Include prompt column for GRPOTrainer
4fb3c37

mahithakur commited on

Use finite datasets.Dataset for GRPOTrainer compatibility
3476016

mahithakur commited on

Yield tensors from GRPO dataset generator
e817e69

mahithakur commited on

Fix GRPO dataloader batching on Kaggle
8568d9f

mahithakur commited on

Fix GRPO sample id mapping and Kaggle training setup
2066092

mahithakur commited on

Fix division by zero in final avg improvement calculation
022be04

mahithakur commited on

Fix category enum: use lowercase for ProbeAction categories
b91005f

mahithakur commited on

Fix eval_report.py: use step() return value and extract reward from observation
b3ab235

mahithakur commited on

Remove unsupported data_collator argument from GRPOTrainer
97ddd73

mahithakur commited on

Add data collator to GRPOTrainer for on-the-fly prompt tokenization
f2d68bf

mahithakur commited on

Revert to raw prompt format—let GRPOTrainer handle tokenization
e49866a

mahithakur commited on

Convert tokenized outputs to torch tensors for proper batching
9ba7512

mahithakur commited on

Tokenize prompts in dataset generator for GRPOTrainer compatibility
75be7bc

mahithakur commited on

Fix prompt format: convert from chat list to string for GRPOTrainer compatibility
77793ce

mahithakur commited on

Fix dataset format: only yield 'prompt' field to avoid tensor concatenation errors
80c91ac

mahithakur commited on

Remove tokenizer argument from GRPOTrainer—not supported in installed TRL version
4c16e6a

mahithakur commited on

Fix path bootstrap in train_grpo.py: use parent.parent to reach project root
bc2ac25

mahithakur commited on

Fix GRPO batch size config for Kaggle P100: batch_size=4, grad_accum=2 (global=8, divisible by num_generations=2)
feefe4a

mahithakur commited on

Add eval_report.py for before/after training comparison
4e029fe

mahithakur commited on

Fix JSON parsing and environment bugs
c22ceaa

mahithakur commited on

Code improvenets
fa66cd4

Thakur, Mahipal commited on

refactor: remove legacy architecture, promote clean structure to repo root
85fab7b

Thakur, Mahipal commited on