Upload folder using huggingface_hub
685d968
verified
about 1 month ago
Model Card: Memory Routing Agent (Llama-8B + LoRA)
Model Details
Model Name : memory-routing-llama-8b-lora
Base Model : meta-llama/Llama-3.1-8B
Architecture : LoRA (Low-Rank Adaptation), rank 32
Training Platform : Tinker (Thinking Machines)
Training Method : SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning)
Parameters : ~8B base + ~100M LoRA adapters
License : Apache 2.0
Intended Use
This model classifies marketing conversations into memory categories for AI assistant systems. It determines which pieces of information from a conversation should be stored in long-term memory and how they should be categorized.
Primary Use Cases
Marketing AI assistants that need to remember user preferences
CRM systems that extract structured data from conversations
Knowledge management systems for marketing teams
Out-of-Scope Uses
General-purpose chatbots
Non-marketing domains (healthcare, legal, finance)
Real-time conversation generation
Training Data
Synthetic Dataset
Size : 2,001 conversations
Generation : Cohere Command-R-Plus (104B) as teacher model
Format : Multi-turn marketing conversations with category labels
Category Taxonomy (13 categories)
Category
Description
Persistence
company.brand_core
Voice, values, positioning
Long (>1y)
company.strategic_signatures
Decision frameworks
Long (>1y)
company.knowledge_artifacts
Docs, style guides
Long (>1y)
company.business_priorities
Quarterly goals
Short (<3m)
company.tools_config
Integrations, APIs
Medium (~6m)
company.performance_context
Campaign metrics
Rolling (~6m)
user.communication_style
Tone, format preferences
Long (>1y)
user.strategic_approach
Personal priorities
Long (>1y)
user.role_context
Title, scope
Medium (~1y)
user.workflow_patterns
Review cadence
Medium (~1y)
user.session_history
Immediate context
Short (<2w)
user.interaction_preferences
Coaching style
Evolving
none
Irrelevant content
N/A
Training Procedure
Phase 1: Supervised Fine-Tuning (SFT)
Steps : 100
Batch Size : 128
Learning Rate : 2.86e-4 (Tinker default for Llama-8B)
Optimizer : Adam (β1=0.9, β2=0.95)
Loss Function : Cross-entropy
Phase 2: Reinforcement Learning (RL)
Iterations : 12
Groups per Batch : 64
Group Size : 32
Learning Rate : 2e-5
Loss Function : Importance sampling policy gradient
Reward Function :
R_F1 (60%): F1 score vs gold labels
R_temp (20%): Temporal alignment
R_parity (10%): Company/user scope
R_eff (10%): Storage efficiency
Evaluation Results
Marketing Routing Benchmark (50 scenarios)
Model
Any Match
Exact Match
Avg F1
Ours (8B + LoRA)
72%
60%
0.68
Cohere Command-R-Plus (104B)
82%
26%
0.61
Key Findings
11.1% higher F1 than the 104B teacher model
2.3x better exact match accuracy
13x smaller than the teacher model
Excels at single-category classification (86% exact on easy cases)
Struggles with multi-label scenarios (10% exact on hard cases)
Performance by Difficulty
Difficulty
Our Model (F1)
Cohere (F1)
Delta
Easy
0.86
0.48
+79%
Medium
0.65
0.64
+2%
Hard
0.50
0.72
-31%
Limitations
Multi-label Detection : Under-predicts when multiple categories apply
Company vs User Confusion : Sometimes confuses company.strategic_signatures with user.strategic_approach
Hard Cases : Performance drops on complex overlapping categories
Domain Specificity : Trained only on marketing scenarios
Ethical Considerations
Model trained on synthetic data; may not capture all real-world edge cases
Should be used with human oversight for critical decisions
Privacy: Does not store or transmit conversation data
Citation
@misc{memory-routing-agent-2025,
title={Memory Routing Agent: Prompt Distillation for Marketing AI},
author={Muratcan Koylan},
year={2025},
howpublished={\url{https://github.com/muratcankoylan/memory-routing-agent}},
}
Model Files
training/checkpoints/rl_iter_012/ - Final RL checkpoint
training/benchmarks/marketing_routing_benchmark.json - Benchmark dataset
synthetic_data/merged_training_dataset_2001.jsonl - Training data
Contact
For questions or issues, please open a GitHub issue.