Commit History

track png files with git lfs
78e0a8e

Vighnesh commited on

keep hf version of notebook
138c906

Vighnesh commited on

add training notebook
b8e9753

Vighnesh commited on

add training notebook
5648ca2

Vighnesh commited on

add wandb training logs link
7bdf1e0

Vighnesh commited on

add blog and notebook
b523c77

Vighnesh commited on

Expand to 50 tickets with resolution_hints — tickets.py and notebook ALL_TICKETS in sync
b7372b5

Vighnesh commited on

Add clarifying comment: loop penalty intentionally omitted in notebook (episode-level concern, live env handles it)
97ddb7d

Vighnesh commited on

Sync notebook _local_reward: wire resolution_hint + classified_correctly into task3 reward, cls_credit into task2 step-1
93164f3

Vighnesh commited on

Sync notebook Cell 7 graders with graders.py fixes #2 #3 #4 #5 — smoke test passes
2e680c9

Vighnesh commited on

Fix #5: accumulate Task 2 classification credit into final score — action scaled to 0.7 max, classify adds up to 0.3, total 1.0
55ff252

Vighnesh commited on

Fix #4: use resolution_hint in reply scoring — category hits 0.03, hint hits 0.05, cap 0.25 (intentional specificity incentive)
3d8844e

Vighnesh commited on

Fix #3: track _classified_correctly separately — wrong classification no longer gets free 0.20 credit in Task 3; TODO comment added to Task 2 classify branch
93f0ae5

Vighnesh commited on

Fix #2: cap _reply_quality at 0.25, add case-insensitive punctuation-stripped matching (weights now sum to exactly 1.0)
4744d17

Vighnesh commited on

Highlight Theme 3.1 + Scaler sub-theme fit, promote GRPO results section
3d83a5d

Vighnesh commited on

Fix: use raw GitHub URL for chart image to bypass CDN cache
f45e3e0

Vighnesh commited on

Update: replace broken chart with winning GRPO results (Overall 0.29->0.57)
e531507

Vighnesh commited on

Fix: add gradio to pyproject.toml deps, update README structure to match actual files
d771897

Vighnesh commited on

Cleanup: remove junk files, update .gitignore
95dc191

Vighnesh commited on

Remove redundant train_grpo_safe.ipynb
a016315

Vighnesh commited on

result after no sleep
5d570d6

Vighnesh commited on

Fix SFTConfig: move max_seq_length + dataset_text_field to SFTTrainer (trl API change)
2e81e98

AlgoCore commited on

Add train_sft.ipynb: SFT pre-training with 1000 gold-label examples before GRPO
cf0d796

AlgoCore commited on

Fix AttributeError: 'str' has no .get - add _safe_parse() always returns dict, guard in _local_reward
42a3169

AlgoCore commited on

Fix 500 errors: use LocalEnv for eval (live env is single-instance stateful, breaks under concurrent calls)
d637715

AlgoCore commited on

Fix sanity checks: use correct seed->ticket mapping, json.dumps for completions
c210c77

AlgoCore commited on

Fix: torch_dtype -> dtype (deprecated warning)
e9ad570

AlgoCore commited on

Fix CUDA illegal memory access: drop 4-bit quant, load fp16 natively (0.5B fits in T4), disable DataParallel, fix eval seeds
7a6a712

AlgoCore commited on

Sync _local_reward exactly to graders.py: task2 partial credit, task3 efficiency bonus, loop penalty, reply_quality 0-0.5
a63afc4

AlgoCore commited on

Expand dataset: 50 tickets x 200 seeds x 3 tasks x 2 steps = ~500 samples, local reward fn, 3 epochs
f86d249

AlgoCore commited on

Fix: single GPU device_map, safe Obs parsing, stricter system prompt (no respond/resolve)
2066c50

AlgoCore commited on

Rewrite: real GRPO using trl.GRPOTrainer with proper KL + clipped ratio + reference model
31338a8

AlgoCore commited on

Fix: separate inference/training modes - use_cache=True for generate, gradient_checkpointing only during train
3a05cea

AlgoCore commited on

Kaggle compatibility: auto-detect runtime, fix output paths, remove Colab-only download
d5ed509

AlgoCore commited on

Remove Unsloth: use standard HF transformers + PEFT for GRPO training
9cab132

AlgoCore commited on

feat: auto-fallback to local env mirror when live API unreachable
d4f63e0

AlgoCore commited on

fix: replace remote API with local env for reliable training rewards
243a9db

AlgoCore commited on

fix: support Kaggle secrets via UserSecretsClient
208b464

AlgoCore commited on

fix: set UNSLOTH_RETURN_LOGITS=1 before model load
a667e8c

AlgoCore commited on

fix: set UNSLOTH_RETURN_LOGITS=1 before training loop
c0c51cf

AlgoCore commited on

fix: patch get_statistics in llama+qwen2 modules directly
af78598

AlgoCore commited on

fix: monkey-patch unsloth stats check to bypass TimeoutError
4ca56d6

AlgoCore commited on

fix: remove unsloth, use plain PEFT LoRA - no timeout issues
c44947b

AlgoCore commited on

fix: pin compatible unsloth versions, disable stats timeout
8322f48

AlgoCore commited on

feat: upgrade training notebook to Unsloth (2x faster, 4-bit LoRA)
b8713e5

AlgoCore commited on

fix: clean token handling in notebook using Colab Secrets
cb215eb

AlgoCore commited on

feat: add GRPO training notebook (token-safe, Colab-ready)
ebe1f24

AlgoCore commited on

feat: fix scoring, improve prompt, add reward chart
df8aa05

AlgoCore commited on

feat: fix scoring formula, improve system prompt, boost reply quality grader - Task1 1.0, Task2 0.60, Task3 0.41, Overall 0.67
69afce9

AlgoCore commited on

fix: update openenv-core version to 0.2.2
415c505

AlgoCore commited on