Memory Routing Agent Training
Llama-3.1-8B + LoRA (rank 32) | SFT + RL Training Pipeline
Phase 1: Supervised Fine-Tuning Loss
Phase 2: RL Reward Progression
Final Model Performance
--
F1 Score
--
Precision
--
Recall
--
Any Match
--
Exact Match
--
Mean Reward
Model Comparison: SFT vs RL
Metric
SFT Model
RL Model
Improvement
F1 Score
--
--
--
Any Match Accuracy
--
--
--
Exact Match
--
--
--
Temporal Alignment
--
--
--
Generated: 2025-11-24 16:51:34