update: FAL results report with final eval numbers + conclusions 99c09f5 verified rtferraz commited on May 12
fix: eval crash (ZeroDivisionError) + add eval-only entrypoint (no retraining needed) 3fc0eb8 verified rtferraz commited on May 11
add: FAL demo results report (preliminary β eval crash pending fix) f46c90b verified rtferraz commited on May 11
add: Modal deployment lessons learned β TRL dependency hell postmortem 81195a7 verified rtferraz commited on May 11
fix: num_generations=4 batch=4 (TRL 0.28 requires batch divisible by num_generations) 29c5f9b verified rtferraz commited on May 11
fix: generation_batch_size must be divisible by num_generations β use G=4 batch=4 for T4 9ff6872 verified rtferraz commited on May 11
fix: remove max_prompt_length (not in TRL 0.28 GRPOConfig), switch to T4 + fp16 3063b98 verified rtferraz commited on May 11
fix: install actual weave + pip install llm-blender + mergekit instead of broken stubs 5235a46 verified rtferraz commited on May 11
fix: patch TRL source to make vllm import conditional + remove broken vllm stub 3703999 verified rtferraz commited on May 11
fix: vllm stub version 0.0.1 (below TRL's 0.10.2 minimum) so is_vllm_available() returns False 882fb7f verified rtferraz commited on May 11
fix: vllm stub needs __version__='0.11.0' so TRL version check doesn't crash 8d1fcb1 verified rtferraz commited on May 11
fix: use shell heredoc for stub creation instead of broken f-string interpolation 527dc3c verified rtferraz commited on May 11
fix: use heredoc-style stub creation instead of broken one-liner python 4db8e03 verified rtferraz commited on May 11
fix: drop vllm, patch trl lazy imports to avoid broken optional dep chains 4db2abf verified rtferraz commited on May 11
fix: add weave dep (required by trl 0.28 import chain via wandb integration) 7c91592 verified rtferraz commited on May 11
fix: switch to trl==0.28.0 (Modal-validated, no broken llm_blender dep chain) 18bf4a8 verified rtferraz commited on May 11
fix: install trl[judges] to pull all transitive deps (mergekit, llm_blender, etc) 5337e50 verified rtferraz commited on May 11
fix: complete file with mergekit dep added to pip_install b4af4df verified rtferraz commited on May 11
fix: add mergekit dependency required by trl==0.24.0 import chain ec05953 verified rtferraz commited on May 11
fix: bump transformers>=4.56.1 to match trl==0.24.0 requirement, drop unsloth (use plain PEFT) 9df3ed3 verified rtferraz commited on May 11
fix: resolve dependency conflict β accelerate>=1.4.0 required by trl==0.24.0 1ee7ba1 verified rtferraz commited on May 11
add: Modal GRPO training app for Future-as-Label calibration demo db6a752 verified rtferraz commited on May 11
add: Future-as-Label Modal implementation β data pipeline, training, evaluation d8ffe2c verified rtferraz commited on May 11
add: ADR-003 Future-as-Label demo β detailed implementation plan with research validation d75cbbf verified rtferraz commited on May 11
add: V4.2 Final Report β complete project retrospective with evidence-based analysis 22cca8b verified rtferraz commited on May 3
add: notebook cell insertion script for base vs tuned comparison c641edb verified rtferraz commited on May 2
add: base vs tuned comparison cell for V4.2 final evaluation 0c9199c verified rtferraz commited on May 2
feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking b1be31c rtferraz Claude Haiku 4.5 commited on May 1
fix(probe): use TRL 0.24.0 log keys β rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix) 080fd9a verified rtferraz commited on Apr 28
fix(classifier): reorder _classify_task_type β insights before push to prevent reengajamento misclassification 63b1c86 verified rtferraz commited on Apr 28
fix(rewards): 3 bugs from Cell 8 audit β push length/formal, SQL domain, extraction int check 41eb15f verified rtferraz commited on Apr 28
Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring 71422f3 verified rtferraz commited on Apr 28
Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring 0fc9042 verified rtferraz commited on Apr 28
Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring) c95e44c verified rtferraz commited on Apr 28
docs: add V4.1 run report β detailed evaluation with per-task analysis and V4.2 roadmap 482efc4 verified rtferraz commited on Apr 28
notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup) d7a090d verified rtferraz commited on Apr 27
docs: add V4 run assessment with lessons learned and improvement roadmap cfaf49c verified rtferraz commited on Apr 27
v4: ROOT CAUSE FIX β use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3. 521e1d8 verified rtferraz commited on Apr 25
v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) β load_in_4bit=False, 0.5B fits in full bf16 on 24GB ca397a5 verified rtferraz commited on Apr 25
v4: fix fp16/bf16 mismatch β disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B) a40d2dc verified rtferraz commited on Apr 25
v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning b1bb14c verified rtferraz commited on Apr 25
v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts 631e559 verified rtferraz commited on Apr 25
Fix total_mem β total_memory in V4 notebook (PyTorch API) 5aa00ff rtferraz Claude Sonnet 4.6 commited on Apr 25
Add V4 Instruct-Only GRPO notebook implementing ADR-002 6c7b1ca rtferraz Claude Sonnet 4.6 commited on Apr 25
ADR-002: V4 Instruct-Only GRPO β revises dual-model plan based on model repo audit 50e0e4d verified rtferraz commited on Apr 25