add: V4.2 Final Report β complete project retrospective with evidence-based analysis 22cca8b verified rtferraz commited on 5 days ago
add: notebook cell insertion script for base vs tuned comparison c641edb verified rtferraz commited on 6 days ago
add: base vs tuned comparison cell for V4.2 final evaluation 0c9199c verified rtferraz commited on 6 days ago
feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking b1be31c rtferraz Claude Haiku 4.5 commited on 7 days ago
fix(probe): use TRL 0.24.0 log keys β rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix) 080fd9a verified rtferraz commited on 10 days ago
fix(classifier): reorder _classify_task_type β insights before push to prevent reengajamento misclassification 63b1c86 verified rtferraz commited on 10 days ago
fix(rewards): 3 bugs from Cell 8 audit β push length/formal, SQL domain, extraction int check 41eb15f verified rtferraz commited on 10 days ago
Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring 71422f3 verified rtferraz commited on 10 days ago
Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring 0fc9042 verified rtferraz commited on 10 days ago
Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring) c95e44c verified rtferraz commited on 10 days ago
Add V4.2 GRPO training notebook (Gold Standard, 0.5B) c5f1d2d verified rtferraz commited on 10 days ago
docs: add V4.1 run report β detailed evaluation with per-task analysis and V4.2 roadmap 482efc4 verified rtferraz commited on 10 days ago
notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup) d7a090d verified rtferraz commited on 11 days ago
docs: add V4 run assessment with lessons learned and improvement roadmap cfaf49c verified rtferraz commited on 11 days ago
v4: ROOT CAUSE FIX β use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3. 521e1d8 verified rtferraz commited on 13 days ago
v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) β load_in_4bit=False, 0.5B fits in full bf16 on 24GB ca397a5 verified rtferraz commited on 13 days ago
v4: fix fp16/bf16 mismatch β disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B) a40d2dc verified rtferraz commited on 13 days ago
v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning b1bb14c verified rtferraz commited on 13 days ago
v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts 631e559 verified rtferraz commited on 13 days ago
Fix total_mem β total_memory in V4 notebook (PyTorch API) 5aa00ff rtferraz Claude Sonnet 4.6 commited on 13 days ago
Add V4 Instruct-Only GRPO notebook implementing ADR-002 6c7b1ca rtferraz Claude Sonnet 4.6 commited on 13 days ago
ADR-002: V4 Instruct-Only GRPO β revises dual-model plan based on model repo audit 50e0e4d verified rtferraz commited on 13 days ago
Add comprehensive investigation report β performance audit, unexplored alternatives, literature-backed recommendations 4312bfd verified rtferraz commited on 14 days ago
Add session checkpoint: v3 launch decision with full context bead5cb verified rtferraz commited on 15 days ago
apply v3 task-aware thinking controls and delete deprecated notebook 1d514ac rtferraz commited on 15 days ago
Add v3 thinking control patch - task-aware system prompts + think efficiency reward 0f39df7 verified rtferraz commited on 15 days ago
Initial commit: Tucano2-Commerce GRPO v3 training pipeline fa4a874 rtferraz Claude Opus 4.6 commited on 15 days ago
Rename notebooks/grpo_vertex_v3.ipynb to notebooks/DEPRECATED_grpo_vertex_v3.ipynb a62f1dc verified rtferraz commited on 15 days ago
feat: add v3 notebook (.ipynb) β ready for Vertex AI Workbench 6c51e5f verified rtferraz commited on 15 days ago
feat: add GRPO v3 implementation with entropy collapse fixes a6a8b11 verified rtferraz commited on 15 days ago
docs: add ADR-001 next steps with detailed execution plans b47b36b verified rtferraz commited on 15 days ago