Commit History

update: FAL results report with final eval numbers + conclusions

99c09f5
verified

rtferraz commited on May 12

fix: eval crash (ZeroDivisionError) + add eval-only entrypoint (no retraining needed)

3fc0eb8
verified

rtferraz commited on May 11

add: FAL demo results report (preliminary — eval crash pending fix)

f46c90b
verified

rtferraz commited on May 11

cleanup: remove placeholder files

b27b03d
verified

rtferraz commited on May 11

add: Modal deployment lessons learned — TRL dependency hell postmortem

81195a7
verified

rtferraz commited on May 11

fix: num_generations=4 batch=4 (TRL 0.28 requires batch divisible by num_generations)

29c5f9b
verified

rtferraz commited on May 11

fix: generation_batch_size must be divisible by num_generations — use G=4 batch=4 for T4

9ff6872
verified

rtferraz commited on May 11

fix: remove max_prompt_length (not in TRL 0.28 GRPOConfig), switch to T4 + fp16

3063b98
verified

rtferraz commited on May 11

fix: install actual weave + pip install llm-blender + mergekit instead of broken stubs

5235a46
verified

rtferraz commited on May 11

fix: patch TRL source to make vllm import conditional + remove broken vllm stub

3703999
verified

rtferraz commited on May 11

fix: vllm stub version 0.0.1 (below TRL's 0.10.2 minimum) so is_vllm_available() returns False

882fb7f
verified

rtferraz commited on May 11

fix: vllm stub needs version='0.11.0' so TRL version check doesn't crash

8d1fcb1
verified

rtferraz commited on May 11

fix: use shell heredoc for stub creation instead of broken f-string interpolation

527dc3c
verified

rtferraz commited on May 11

fix: use heredoc-style stub creation instead of broken one-liner python

4db8e03
verified

rtferraz commited on May 11

fix: drop vllm, patch trl lazy imports to avoid broken optional dep chains

4db2abf
verified

rtferraz commited on May 11

fix: add weave dep (required by trl 0.28 import chain via wandb integration)

7c91592
verified

rtferraz commited on May 11

fix: switch to trl==0.28.0 (Modal-validated, no broken llm_blender dep chain)

18bf4a8
verified

rtferraz commited on May 11

fix: install trl[judges] to pull all transitive deps (mergekit, llm_blender, etc)

5337e50
verified

rtferraz commited on May 11

fix: complete file with mergekit dep added to pip_install

b4af4df
verified

rtferraz commited on May 11

fix: add mergekit dependency required by trl==0.24.0 import chain

ec05953
verified

rtferraz commited on May 11

fix: bump transformers>=4.56.1 to match trl==0.24.0 requirement, drop unsloth (use plain PEFT)

9df3ed3
verified

rtferraz commited on May 11

fix: resolve dependency conflict — accelerate>=1.4.0 required by trl==0.24.0

1ee7ba1
verified

rtferraz commited on May 11

add: FAL demo README with Modal run instructions

690f6ac
verified

rtferraz commited on May 11

add: Modal GRPO training app for Future-as-Label calibration demo

db6a752
verified

rtferraz commited on May 11

add: Future-as-Label Modal implementation — data pipeline, training, evaluation

d8ffe2c
verified

rtferraz commited on May 11

add: ADR-003 Future-as-Label demo — detailed implementation plan with research validation

d75cbbf
verified

rtferraz commited on May 11

add: V4.2 Final Report — complete project retrospective with evidence-based analysis

22cca8b
verified

rtferraz commited on May 3

add: notebook cell insertion script for base vs tuned comparison

c641edb
verified

rtferraz commited on May 2

add: base vs tuned comparison cell for V4.2 final evaluation

0c9199c
verified

rtferraz commited on May 2

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking

b1be31c

rtferraz Claude Haiku 4.5 commited on May 1

fix(probe): use TRL 0.24.0 log keys — rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix)

080fd9a
verified

rtferraz commited on Apr 28

fix(classifier): reorder _classify_task_type — insights before push to prevent reengajamento misclassification

63b1c86
verified

rtferraz commited on Apr 28

fix(rewards): 3 bugs from Cell 8 audit — push length/formal, SQL domain, extraction int check

41eb15f
verified

rtferraz commited on Apr 28

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring

71422f3
verified

rtferraz commited on Apr 28

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring

0fc9042
verified

rtferraz commited on Apr 28

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring)

c95e44c
verified

rtferraz commited on Apr 28

Add V4.2 GRPO training notebook (Gold Standard, 0.5B)

c5f1d2d
verified

rtferraz commited on Apr 28

Create v4_2-handoff.md

d1385b0
verified

rtferraz commited on Apr 28

docs: add V4.1 run report — detailed evaluation with per-task analysis and V4.2 roadmap

482efc4
verified

rtferraz commited on Apr 28

notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup)

d7a090d
verified

rtferraz commited on Apr 27

Create v4_1-handoff.md

958f6d7
verified

rtferraz commited on Apr 27

docs: add V4 run assessment with lessons learned and improvement roadmap

cfaf49c
verified

rtferraz commited on Apr 27

v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3.

521e1d8
verified

rtferraz commited on Apr 25

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) — load_in_4bit=False, 0.5B fits in full bf16 on 24GB

ca397a5
verified

rtferraz commited on Apr 25

v4: fix fp16/bf16 mismatch — disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B)

a40d2dc
verified

rtferraz commited on Apr 25

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning

b1bb14c
verified

rtferraz commited on Apr 25

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts

631e559
verified

rtferraz commited on Apr 25

Fix total_mem → total_memory in V4 notebook (PyTorch API)

5aa00ff

rtferraz Claude Sonnet 4.6 commited on Apr 25

Add V4 Instruct-Only GRPO notebook implementing ADR-002

6c7b1ca

rtferraz Claude Sonnet 4.6 commited on Apr 25

ADR-002: V4 Instruct-Only GRPO — revises dual-model plan based on model repo audit

50e0e4d
verified

rtferraz commited on Apr 25

Commit History

update: FAL results report with final eval numbers + conclusions 99c09f5 verified

fix: eval crash (ZeroDivisionError) + add eval-only entrypoint (no retraining needed) 3fc0eb8 verified

add: FAL demo results report (preliminary — eval crash pending fix) f46c90b verified

cleanup: remove placeholder files b27b03d verified

add: Modal deployment lessons learned — TRL dependency hell postmortem 81195a7 verified

fix: num_generations=4 batch=4 (TRL 0.28 requires batch divisible by num_generations) 29c5f9b verified

fix: generation_batch_size must be divisible by num_generations — use G=4 batch=4 for T4 9ff6872 verified

fix: remove max_prompt_length (not in TRL 0.28 GRPOConfig), switch to T4 + fp16 3063b98 verified

fix: install actual weave + pip install llm-blender + mergekit instead of broken stubs 5235a46 verified

fix: patch TRL source to make vllm import conditional + remove broken vllm stub 3703999 verified

fix: vllm stub version 0.0.1 (below TRL's 0.10.2 minimum) so is_vllm_available() returns False 882fb7f verified

fix: vllm stub needs __version__='0.11.0' so TRL version check doesn't crash 8d1fcb1 verified

fix: use shell heredoc for stub creation instead of broken f-string interpolation 527dc3c verified

fix: use heredoc-style stub creation instead of broken one-liner python 4db8e03 verified

fix: drop vllm, patch trl lazy imports to avoid broken optional dep chains 4db2abf verified

fix: add weave dep (required by trl 0.28 import chain via wandb integration) 7c91592 verified

fix: switch to trl==0.28.0 (Modal-validated, no broken llm_blender dep chain) 18bf4a8 verified

fix: install trl[judges] to pull all transitive deps (mergekit, llm_blender, etc) 5337e50 verified

fix: complete file with mergekit dep added to pip_install b4af4df verified

fix: add mergekit dependency required by trl==0.24.0 import chain ec05953 verified

fix: bump transformers>=4.56.1 to match trl==0.24.0 requirement, drop unsloth (use plain PEFT) 9df3ed3 verified

fix: resolve dependency conflict — accelerate>=1.4.0 required by trl==0.24.0 1ee7ba1 verified

add: FAL demo README with Modal run instructions 690f6ac verified

add: Modal GRPO training app for Future-as-Label calibration demo db6a752 verified

add: Future-as-Label Modal implementation — data pipeline, training, evaluation d8ffe2c verified

add: ADR-003 Future-as-Label demo — detailed implementation plan with research validation d75cbbf verified

add: V4.2 Final Report — complete project retrospective with evidence-based analysis 22cca8b verified

add: notebook cell insertion script for base vs tuned comparison c641edb verified

add: base vs tuned comparison cell for V4.2 final evaluation 0c9199c verified

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking b1be31c

fix(probe): use TRL 0.24.0 log keys — rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix) 080fd9a verified

fix(classifier): reorder _classify_task_type — insights before push to prevent reengajamento misclassification 63b1c86 verified

fix(rewards): 3 bugs from Cell 8 audit — push length/formal, SQL domain, extraction int check 41eb15f verified

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring 71422f3 verified

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring 0fc9042 verified

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring) c95e44c verified

Add V4.2 GRPO training notebook (Gold Standard, 0.5B) c5f1d2d verified

Create v4_2-handoff.md d1385b0 verified

docs: add V4.1 run report — detailed evaluation with per-task analysis and V4.2 roadmap 482efc4 verified

notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup) d7a090d verified

Create v4_1-handoff.md 958f6d7 verified

docs: add V4 run assessment with lessons learned and improvement roadmap cfaf49c verified

v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3. 521e1d8 verified

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) — load_in_4bit=False, 0.5B fits in full bf16 on 24GB ca397a5 verified

v4: fix fp16/bf16 mismatch — disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B) a40d2dc verified

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning b1bb14c verified

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts 631e559 verified

Fix total_mem → total_memory in V4 notebook (PyTorch API) 5aa00ff

Add V4 Instruct-Only GRPO notebook implementing ADR-002 6c7b1ca

ADR-002: V4 Instruct-Only GRPO — revises dual-model plan based on model repo audit 50e0e4d verified

update: FAL results report with final eval numbers + conclusions

99c09f5
verified

fix: eval crash (ZeroDivisionError) + add eval-only entrypoint (no retraining needed)

3fc0eb8
verified

add: FAL demo results report (preliminary — eval crash pending fix)

f46c90b
verified

cleanup: remove placeholder files

b27b03d
verified

add: Modal deployment lessons learned — TRL dependency hell postmortem

81195a7
verified

fix: num_generations=4 batch=4 (TRL 0.28 requires batch divisible by num_generations)

29c5f9b
verified

fix: generation_batch_size must be divisible by num_generations — use G=4 batch=4 for T4

9ff6872
verified

fix: remove max_prompt_length (not in TRL 0.28 GRPOConfig), switch to T4 + fp16

3063b98
verified

fix: install actual weave + pip install llm-blender + mergekit instead of broken stubs

5235a46
verified

fix: patch TRL source to make vllm import conditional + remove broken vllm stub

3703999
verified

fix: vllm stub version 0.0.1 (below TRL's 0.10.2 minimum) so is_vllm_available() returns False

882fb7f
verified

fix: vllm stub needs version='0.11.0' so TRL version check doesn't crash

8d1fcb1
verified

fix: use shell heredoc for stub creation instead of broken f-string interpolation

527dc3c
verified

fix: use heredoc-style stub creation instead of broken one-liner python

4db8e03
verified

fix: drop vllm, patch trl lazy imports to avoid broken optional dep chains

4db2abf
verified

fix: add weave dep (required by trl 0.28 import chain via wandb integration)

7c91592
verified

fix: switch to trl==0.28.0 (Modal-validated, no broken llm_blender dep chain)

18bf4a8
verified

fix: install trl[judges] to pull all transitive deps (mergekit, llm_blender, etc)

5337e50
verified

fix: complete file with mergekit dep added to pip_install

b4af4df
verified

fix: add mergekit dependency required by trl==0.24.0 import chain

ec05953
verified

fix: bump transformers>=4.56.1 to match trl==0.24.0 requirement, drop unsloth (use plain PEFT)

9df3ed3
verified

fix: resolve dependency conflict — accelerate>=1.4.0 required by trl==0.24.0

1ee7ba1
verified

add: FAL demo README with Modal run instructions

690f6ac
verified

add: Modal GRPO training app for Future-as-Label calibration demo

db6a752
verified

add: Future-as-Label Modal implementation — data pipeline, training, evaluation

d8ffe2c
verified

add: ADR-003 Future-as-Label demo — detailed implementation plan with research validation

d75cbbf
verified

add: V4.2 Final Report — complete project retrospective with evidence-based analysis

22cca8b
verified

add: notebook cell insertion script for base vs tuned comparison

c641edb
verified

add: base vs tuned comparison cell for V4.2 final evaluation

0c9199c
verified

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking

b1be31c

fix(probe): use TRL 0.24.0 log keys — rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix)

080fd9a
verified

fix(classifier): reorder _classify_task_type — insights before push to prevent reengajamento misclassification

63b1c86
verified

fix(rewards): 3 bugs from Cell 8 audit — push length/formal, SQL domain, extraction int check

41eb15f
verified

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring

71422f3
verified

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring

0fc9042
verified

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring)

c95e44c
verified

Add V4.2 GRPO training notebook (Gold Standard, 0.5B)

c5f1d2d
verified

Create v4_2-handoff.md

d1385b0
verified

docs: add V4.1 run report — detailed evaluation with per-task analysis and V4.2 roadmap

482efc4
verified

notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup)

d7a090d
verified

Create v4_1-handoff.md

958f6d7
verified

docs: add V4 run assessment with lessons learned and improvement roadmap

cfaf49c
verified

v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3.

521e1d8
verified

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) — load_in_4bit=False, 0.5B fits in full bf16 on 24GB

ca397a5
verified

v4: fix fp16/bf16 mismatch — disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B)

a40d2dc
verified

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning

b1bb14c
verified

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts

631e559
verified

Fix total_mem → total_memory in V4 notebook (PyTorch API)

5aa00ff

Add V4 Instruct-Only GRPO notebook implementing ADR-002

6c7b1ca

ADR-002: V4 Instruct-Only GRPO — revises dual-model plan based on model repo audit

50e0e4d
verified