Commit History

update: FAL results report with final eval numbers + conclusions
99c09f5
verified

rtferraz commited on

fix: eval crash (ZeroDivisionError) + add eval-only entrypoint (no retraining needed)
3fc0eb8
verified

rtferraz commited on

add: FAL demo results report (preliminary β€” eval crash pending fix)
f46c90b
verified

rtferraz commited on

cleanup: remove placeholder files
b27b03d
verified

rtferraz commited on

add: Modal deployment lessons learned β€” TRL dependency hell postmortem
81195a7
verified

rtferraz commited on

fix: num_generations=4 batch=4 (TRL 0.28 requires batch divisible by num_generations)
29c5f9b
verified

rtferraz commited on

fix: generation_batch_size must be divisible by num_generations β€” use G=4 batch=4 for T4
9ff6872
verified

rtferraz commited on

fix: remove max_prompt_length (not in TRL 0.28 GRPOConfig), switch to T4 + fp16
3063b98
verified

rtferraz commited on

fix: install actual weave + pip install llm-blender + mergekit instead of broken stubs
5235a46
verified

rtferraz commited on

fix: patch TRL source to make vllm import conditional + remove broken vllm stub
3703999
verified

rtferraz commited on

fix: vllm stub version 0.0.1 (below TRL's 0.10.2 minimum) so is_vllm_available() returns False
882fb7f
verified

rtferraz commited on

fix: vllm stub needs __version__='0.11.0' so TRL version check doesn't crash
8d1fcb1
verified

rtferraz commited on

fix: use shell heredoc for stub creation instead of broken f-string interpolation
527dc3c
verified

rtferraz commited on

fix: use heredoc-style stub creation instead of broken one-liner python
4db8e03
verified

rtferraz commited on

fix: drop vllm, patch trl lazy imports to avoid broken optional dep chains
4db2abf
verified

rtferraz commited on

fix: add weave dep (required by trl 0.28 import chain via wandb integration)
7c91592
verified

rtferraz commited on

fix: switch to trl==0.28.0 (Modal-validated, no broken llm_blender dep chain)
18bf4a8
verified

rtferraz commited on

fix: install trl[judges] to pull all transitive deps (mergekit, llm_blender, etc)
5337e50
verified

rtferraz commited on

fix: complete file with mergekit dep added to pip_install
b4af4df
verified

rtferraz commited on

fix: add mergekit dependency required by trl==0.24.0 import chain
ec05953
verified

rtferraz commited on

fix: bump transformers>=4.56.1 to match trl==0.24.0 requirement, drop unsloth (use plain PEFT)
9df3ed3
verified

rtferraz commited on

fix: resolve dependency conflict β€” accelerate>=1.4.0 required by trl==0.24.0
1ee7ba1
verified

rtferraz commited on

add: FAL demo README with Modal run instructions
690f6ac
verified

rtferraz commited on

add: Modal GRPO training app for Future-as-Label calibration demo
db6a752
verified

rtferraz commited on

add: Future-as-Label Modal implementation β€” data pipeline, training, evaluation
d8ffe2c
verified

rtferraz commited on

add: ADR-003 Future-as-Label demo β€” detailed implementation plan with research validation
d75cbbf
verified

rtferraz commited on

add: V4.2 Final Report β€” complete project retrospective with evidence-based analysis
22cca8b
verified

rtferraz commited on

add: notebook cell insertion script for base vs tuned comparison
c641edb
verified

rtferraz commited on

add: base vs tuned comparison cell for V4.2 final evaluation
0c9199c
verified

rtferraz commited on

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking
b1be31c

rtferraz Claude Haiku 4.5 commited on

fix(probe): use TRL 0.24.0 log keys β€” rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix)
080fd9a
verified

rtferraz commited on

fix(classifier): reorder _classify_task_type β€” insights before push to prevent reengajamento misclassification
63b1c86
verified

rtferraz commited on

fix(rewards): 3 bugs from Cell 8 audit β€” push length/formal, SQL domain, extraction int check
41eb15f
verified

rtferraz commited on

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring
71422f3
verified

rtferraz commited on

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring
0fc9042
verified

rtferraz commited on

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring)
c95e44c
verified

rtferraz commited on

Add V4.2 GRPO training notebook (Gold Standard, 0.5B)
c5f1d2d
verified

rtferraz commited on

Create v4_2-handoff.md
d1385b0
verified

rtferraz commited on

docs: add V4.1 run report β€” detailed evaluation with per-task analysis and V4.2 roadmap
482efc4
verified

rtferraz commited on

notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup)
d7a090d
verified

rtferraz commited on

Create v4_1-handoff.md
958f6d7
verified

rtferraz commited on

docs: add V4 run assessment with lessons learned and improvement roadmap
cfaf49c
verified

rtferraz commited on

v4: ROOT CAUSE FIX β€” use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3.
521e1d8
verified

rtferraz commited on

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) β€” load_in_4bit=False, 0.5B fits in full bf16 on 24GB
ca397a5
verified

rtferraz commited on

v4: fix fp16/bf16 mismatch β€” disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B)
a40d2dc
verified

rtferraz commited on

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning
b1bb14c
verified

rtferraz commited on

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts
631e559
verified

rtferraz commited on

Fix total_mem β†’ total_memory in V4 notebook (PyTorch API)
5aa00ff

rtferraz Claude Sonnet 4.6 commited on

Add V4 Instruct-Only GRPO notebook implementing ADR-002
6c7b1ca

rtferraz Claude Sonnet 4.6 commited on

ADR-002: V4 Instruct-Only GRPO β€” revises dual-model plan based on model repo audit
50e0e4d
verified

rtferraz commited on