Spaces:

AlgoCore
/

support-ticket-env

Sleeping

App Files Files Community

support-ticket-env

Commit History

track png files with git lfs

78e0a8e

Vighnesh commited on Apr 28

keep hf version of notebook

138c906

Vighnesh commited on Apr 28

add training notebook

b8e9753

Vighnesh commited on Apr 26

add training notebook

5648ca2

Vighnesh commited on Apr 26

add wandb training logs link

7bdf1e0

Vighnesh commited on Apr 26

add blog and notebook

b523c77

Vighnesh commited on Apr 26

Expand to 50 tickets with resolution_hints — tickets.py and notebook ALL_TICKETS in sync

b7372b5

Vighnesh commited on Apr 26

Add clarifying comment: loop penalty intentionally omitted in notebook (episode-level concern, live env handles it)

97ddb7d

Vighnesh commited on Apr 26

Sync notebook _local_reward: wire resolution_hint + classified_correctly into task3 reward, cls_credit into task2 step-1

93164f3

Vighnesh commited on Apr 26

Sync notebook Cell 7 graders with graders.py fixes #2 #3 #4 #5 — smoke test passes

2e680c9

Vighnesh commited on Apr 26

Fix #5: accumulate Task 2 classification credit into final score — action scaled to 0.7 max, classify adds up to 0.3, total 1.0

55ff252

Vighnesh commited on Apr 26

Fix #4: use resolution_hint in reply scoring — category hits 0.03, hint hits 0.05, cap 0.25 (intentional specificity incentive)

3d8844e

Vighnesh commited on Apr 26

Fix #3: track _classified_correctly separately — wrong classification no longer gets free 0.20 credit in Task 3; TODO comment added to Task 2 classify branch

93f0ae5

Vighnesh commited on Apr 26

Fix #2: cap _reply_quality at 0.25, add case-insensitive punctuation-stripped matching (weights now sum to exactly 1.0)

4744d17

Vighnesh commited on Apr 26

Highlight Theme 3.1 + Scaler sub-theme fit, promote GRPO results section

3d83a5d

Vighnesh commited on Apr 26

Fix: use raw GitHub URL for chart image to bypass CDN cache

f45e3e0

Vighnesh commited on Apr 26

Update: replace broken chart with winning GRPO results (Overall 0.29->0.57)

e531507

Vighnesh commited on Apr 26

Fix: add gradio to pyproject.toml deps, update README structure to match actual files

d771897

Vighnesh commited on Apr 26

Cleanup: remove junk files, update .gitignore

95dc191

Vighnesh commited on Apr 26

Remove redundant train_grpo_safe.ipynb

a016315

Vighnesh commited on Apr 26

result after no sleep

5d570d6

Vighnesh commited on Apr 26

Fix SFTConfig: move max_seq_length + dataset_text_field to SFTTrainer (trl API change)

2e81e98

AlgoCore commited on Apr 25

Add train_sft.ipynb: SFT pre-training with 1000 gold-label examples before GRPO

cf0d796

AlgoCore commited on Apr 25

Fix AttributeError: 'str' has no .get - add _safe_parse() always returns dict, guard in _local_reward

42a3169

AlgoCore commited on Apr 25

Fix 500 errors: use LocalEnv for eval (live env is single-instance stateful, breaks under concurrent calls)

d637715

AlgoCore commited on Apr 25

Fix sanity checks: use correct seed->ticket mapping, json.dumps for completions

c210c77

AlgoCore commited on Apr 25

Fix: torch_dtype -> dtype (deprecated warning)

e9ad570

AlgoCore commited on Apr 25

Fix CUDA illegal memory access: drop 4-bit quant, load fp16 natively (0.5B fits in T4), disable DataParallel, fix eval seeds

7a6a712

AlgoCore commited on Apr 25

Sync _local_reward exactly to graders.py: task2 partial credit, task3 efficiency bonus, loop penalty, reply_quality 0-0.5

a63afc4

AlgoCore commited on Apr 25

Expand dataset: 50 tickets x 200 seeds x 3 tasks x 2 steps = ~500 samples, local reward fn, 3 epochs

f86d249

AlgoCore commited on Apr 25

Fix: single GPU device_map, safe Obs parsing, stricter system prompt (no respond/resolve)

2066c50

AlgoCore commited on Apr 25

Rewrite: real GRPO using trl.GRPOTrainer with proper KL + clipped ratio + reference model

31338a8

AlgoCore commited on Apr 25

Fix: separate inference/training modes - use_cache=True for generate, gradient_checkpointing only during train

3a05cea

AlgoCore commited on Apr 25

Kaggle compatibility: auto-detect runtime, fix output paths, remove Colab-only download

d5ed509

AlgoCore commited on Apr 25

Remove Unsloth: use standard HF transformers + PEFT for GRPO training

9cab132

AlgoCore commited on Apr 25

feat: auto-fallback to local env mirror when live API unreachable

d4f63e0

AlgoCore commited on Apr 25

fix: replace remote API with local env for reliable training rewards

243a9db

AlgoCore commited on Apr 25

fix: support Kaggle secrets via UserSecretsClient

208b464

AlgoCore commited on Apr 25

fix: set UNSLOTH_RETURN_LOGITS=1 before model load

a667e8c

AlgoCore commited on Apr 25

fix: set UNSLOTH_RETURN_LOGITS=1 before training loop

c0c51cf

AlgoCore commited on Apr 25

fix: patch get_statistics in llama+qwen2 modules directly

af78598

AlgoCore commited on Apr 25

fix: monkey-patch unsloth stats check to bypass TimeoutError

4ca56d6

AlgoCore commited on Apr 25

fix: remove unsloth, use plain PEFT LoRA - no timeout issues

c44947b

AlgoCore commited on Apr 25

fix: pin compatible unsloth versions, disable stats timeout

8322f48

AlgoCore commited on Apr 25

feat: upgrade training notebook to Unsloth (2x faster, 4-bit LoRA)

b8713e5

AlgoCore commited on Apr 25

fix: clean token handling in notebook using Colab Secrets

cb215eb

AlgoCore commited on Apr 25

feat: add GRPO training notebook (token-safe, Colab-ready)

ebe1f24

AlgoCore commited on Apr 25

feat: fix scoring, improve prompt, add reward chart

df8aa05

AlgoCore commited on Apr 25

feat: fix scoring formula, improve system prompt, boost reply quality grader - Task1 1.0, Task2 0.60, Task3 0.41, Overall 0.67

69afce9

AlgoCore commited on Apr 25

fix: update openenv-core version to 0.2.2

415c505

AlgoCore commited on Apr 24

Commit History

track png files with git lfs 78e0a8e

keep hf version of notebook 138c906

add training notebook b8e9753

add training notebook 5648ca2

add wandb training logs link 7bdf1e0

add blog and notebook b523c77

Expand to 50 tickets with resolution_hints — tickets.py and notebook ALL_TICKETS in sync b7372b5

Add clarifying comment: loop penalty intentionally omitted in notebook (episode-level concern, live env handles it) 97ddb7d

Sync notebook _local_reward: wire resolution_hint + classified_correctly into task3 reward, cls_credit into task2 step-1 93164f3

Sync notebook Cell 7 graders with graders.py fixes #2 #3 #4 #5 — smoke test passes 2e680c9

Fix #5: accumulate Task 2 classification credit into final score — action scaled to 0.7 max, classify adds up to 0.3, total 1.0 55ff252

Fix #4: use resolution_hint in reply scoring — category hits 0.03, hint hits 0.05, cap 0.25 (intentional specificity incentive) 3d8844e

Fix #3: track _classified_correctly separately — wrong classification no longer gets free 0.20 credit in Task 3; TODO comment added to Task 2 classify branch 93f0ae5

Fix #2: cap _reply_quality at 0.25, add case-insensitive punctuation-stripped matching (weights now sum to exactly 1.0) 4744d17

Highlight Theme 3.1 + Scaler sub-theme fit, promote GRPO results section 3d83a5d

Fix: use raw GitHub URL for chart image to bypass CDN cache f45e3e0

Update: replace broken chart with winning GRPO results (Overall 0.29->0.57) e531507

Fix: add gradio to pyproject.toml deps, update README structure to match actual files d771897

Cleanup: remove junk files, update .gitignore 95dc191

Remove redundant train_grpo_safe.ipynb a016315

result after no sleep 5d570d6

Fix SFTConfig: move max_seq_length + dataset_text_field to SFTTrainer (trl API change) 2e81e98

Add train_sft.ipynb: SFT pre-training with 1000 gold-label examples before GRPO cf0d796

Fix AttributeError: 'str' has no .get - add _safe_parse() always returns dict, guard in _local_reward 42a3169

Fix 500 errors: use LocalEnv for eval (live env is single-instance stateful, breaks under concurrent calls) d637715

Fix sanity checks: use correct seed->ticket mapping, json.dumps for completions c210c77

Fix: torch_dtype -> dtype (deprecated warning) e9ad570

Fix CUDA illegal memory access: drop 4-bit quant, load fp16 natively (0.5B fits in T4), disable DataParallel, fix eval seeds 7a6a712

Sync _local_reward exactly to graders.py: task2 partial credit, task3 efficiency bonus, loop penalty, reply_quality 0-0.5 a63afc4

Expand dataset: 50 tickets x 200 seeds x 3 tasks x 2 steps = ~500 samples, local reward fn, 3 epochs f86d249

Fix: single GPU device_map, safe Obs parsing, stricter system prompt (no respond/resolve) 2066c50

Rewrite: real GRPO using trl.GRPOTrainer with proper KL + clipped ratio + reference model 31338a8

Fix: separate inference/training modes - use_cache=True for generate, gradient_checkpointing only during train 3a05cea

Kaggle compatibility: auto-detect runtime, fix output paths, remove Colab-only download d5ed509

Remove Unsloth: use standard HF transformers + PEFT for GRPO training 9cab132

feat: auto-fallback to local env mirror when live API unreachable d4f63e0

fix: replace remote API with local env for reliable training rewards 243a9db

fix: support Kaggle secrets via UserSecretsClient 208b464

fix: set UNSLOTH_RETURN_LOGITS=1 before model load a667e8c

fix: set UNSLOTH_RETURN_LOGITS=1 before training loop c0c51cf

fix: patch get_statistics in llama+qwen2 modules directly af78598

fix: monkey-patch unsloth stats check to bypass TimeoutError 4ca56d6

fix: remove unsloth, use plain PEFT LoRA - no timeout issues c44947b

fix: pin compatible unsloth versions, disable stats timeout 8322f48

feat: upgrade training notebook to Unsloth (2x faster, 4-bit LoRA) b8713e5

fix: clean token handling in notebook using Colab Secrets cb215eb

feat: add GRPO training notebook (token-safe, Colab-ready) ebe1f24

feat: fix scoring, improve prompt, add reward chart df8aa05

feat: fix scoring formula, improve system prompt, boost reply quality grader - Task1 1.0, Task2 0.60, Task3 0.41, Overall 0.67 69afce9

fix: update openenv-core version to 0.2.2 415c505

track png files with git lfs

78e0a8e

keep hf version of notebook

138c906

add training notebook

b8e9753

add training notebook

5648ca2

add wandb training logs link

7bdf1e0

add blog and notebook

b523c77

Expand to 50 tickets with resolution_hints — tickets.py and notebook ALL_TICKETS in sync

b7372b5

Add clarifying comment: loop penalty intentionally omitted in notebook (episode-level concern, live env handles it)

97ddb7d

Sync notebook _local_reward: wire resolution_hint + classified_correctly into task3 reward, cls_credit into task2 step-1

93164f3

Sync notebook Cell 7 graders with graders.py fixes #2 #3 #4 #5 — smoke test passes

2e680c9

Fix #5: accumulate Task 2 classification credit into final score — action scaled to 0.7 max, classify adds up to 0.3, total 1.0

55ff252

Fix #4: use resolution_hint in reply scoring — category hits 0.03, hint hits 0.05, cap 0.25 (intentional specificity incentive)

3d8844e

Fix #3: track _classified_correctly separately — wrong classification no longer gets free 0.20 credit in Task 3; TODO comment added to Task 2 classify branch

93f0ae5

Fix #2: cap _reply_quality at 0.25, add case-insensitive punctuation-stripped matching (weights now sum to exactly 1.0)

4744d17

Highlight Theme 3.1 + Scaler sub-theme fit, promote GRPO results section

3d83a5d

Fix: use raw GitHub URL for chart image to bypass CDN cache

f45e3e0

Update: replace broken chart with winning GRPO results (Overall 0.29->0.57)

e531507

Fix: add gradio to pyproject.toml deps, update README structure to match actual files

d771897

Cleanup: remove junk files, update .gitignore

95dc191

Remove redundant train_grpo_safe.ipynb

a016315

result after no sleep

5d570d6

Fix SFTConfig: move max_seq_length + dataset_text_field to SFTTrainer (trl API change)

2e81e98

Add train_sft.ipynb: SFT pre-training with 1000 gold-label examples before GRPO

cf0d796

Fix AttributeError: 'str' has no .get - add _safe_parse() always returns dict, guard in _local_reward

42a3169

Fix 500 errors: use LocalEnv for eval (live env is single-instance stateful, breaks under concurrent calls)

d637715

Fix sanity checks: use correct seed->ticket mapping, json.dumps for completions

c210c77

Fix: torch_dtype -> dtype (deprecated warning)

e9ad570

Fix CUDA illegal memory access: drop 4-bit quant, load fp16 natively (0.5B fits in T4), disable DataParallel, fix eval seeds

7a6a712

Sync _local_reward exactly to graders.py: task2 partial credit, task3 efficiency bonus, loop penalty, reply_quality 0-0.5

a63afc4

Expand dataset: 50 tickets x 200 seeds x 3 tasks x 2 steps = ~500 samples, local reward fn, 3 epochs

f86d249

Fix: single GPU device_map, safe Obs parsing, stricter system prompt (no respond/resolve)

2066c50

Rewrite: real GRPO using trl.GRPOTrainer with proper KL + clipped ratio + reference model

31338a8

Fix: separate inference/training modes - use_cache=True for generate, gradient_checkpointing only during train

3a05cea

Kaggle compatibility: auto-detect runtime, fix output paths, remove Colab-only download

d5ed509

Remove Unsloth: use standard HF transformers + PEFT for GRPO training

9cab132

feat: auto-fallback to local env mirror when live API unreachable

d4f63e0

fix: replace remote API with local env for reliable training rewards

243a9db

fix: support Kaggle secrets via UserSecretsClient

208b464

fix: set UNSLOTH_RETURN_LOGITS=1 before model load

a667e8c

fix: set UNSLOTH_RETURN_LOGITS=1 before training loop

c0c51cf

fix: patch get_statistics in llama+qwen2 modules directly

af78598

fix: monkey-patch unsloth stats check to bypass TimeoutError

4ca56d6

fix: remove unsloth, use plain PEFT LoRA - no timeout issues

c44947b

fix: pin compatible unsloth versions, disable stats timeout

8322f48

feat: upgrade training notebook to Unsloth (2x faster, 4-bit LoRA)

b8713e5

fix: clean token handling in notebook using Colab Secrets

cb215eb

feat: add GRPO training notebook (token-safe, Colab-ready)

ebe1f24

feat: fix scoring, improve prompt, add reward chart

df8aa05

feat: fix scoring formula, improve system prompt, boost reply quality grader - Task1 1.0, Task2 0.60, Task3 0.41, Overall 0.67

69afce9

fix: update openenv-core version to 0.2.2

415c505