Memory Routing Agent Training

Llama-3.1-8B + LoRA (rank 32) | SFT + RL Training Pipeline

Phase 1: Supervised Fine-Tuning Loss

Phase 2: RL Reward Progression

Final Model Performance

--
F1 Score
--
Precision
--
Recall
--
Any Match
--
Exact Match
--
Mean Reward

Model Comparison: SFT vs RL

Metric SFT Model RL Model Improvement
F1 Score -- -- --
Any Match Accuracy -- -- --
Exact Match -- -- --
Temporal Alignment -- -- --
Generated: 2025-11-24 16:51:34