File size: 12,959 Bytes
e4c940d 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 685d968 192d8d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 |
---
license: apache-2.0
language:
- en
tags:
- memory-routing
- marketing
- classification
- llama
- lora
- tinker
- prompt-distillation
base_model: meta-llama/Llama-3.1-8B
metrics:
- f1
- accuracy
pipeline_tag: text-classification
---
# Memory Routing Agent
**A specialized 8B model that outperforms its 104B teacher on marketing conversation classification.**
[](https://huggingface.co/MuratcanKoylan/Marketing-Memory-Routing-8B)
[](https://github.com/muratcankoylan/memory-routing-agent)
[](LICENSE)
---
## The Experiment
This project demonstrates **prompt distillation**: training a small, specialized model to outperform the large model that generated its training data.
### The Challenge
Marketing AI assistants need to remember the right information from conversations. Not everything is worth storing - you need to distinguish between:
- **Valuable**: "Our brand voice is professional but approachable" β Store in long-term memory
- **Transactional**: "What time is the meeting tomorrow?" β Don't store
This is a **13-category classification problem** with nuanced distinctions between company-level and user-level information, different persistence horizons, and the critical ability to say "none" for irrelevant content.
### The Approach
1. **Generate synthetic data** using Cohere Command-R-Plus (104B) as the teacher
2. **Fine-tune Llama-3.1-8B** with LoRA using Tinker's training platform
3. **Apply reinforcement learning** with a custom reward function
4. **Benchmark against the teacher** on challenging, held-out scenarios
### The Result
| Model | Parameters | Avg F1 | Exact Match |
|-------|------------|--------|-------------|
| **Ours** | **8B** | **0.68** | **60%** |
| Cohere Command-R-Plus | 104B | 0.61 | 26% |
**Our 8B model achieves 11.1% higher F1 and 2.3x better exact match accuracy than the 104B teacher, while being 13x smaller.**
The student surpassed the teacher through:
- **Focused training**: The model only learns this one task, not general capabilities
- **RL refinement**: The reward function optimizes for exact category matching, not just plausible outputs
- **Clean data**: Synthetic data with consistent labeling, no noise from human annotation disagreements
---
## Training Visualizations
### Phase 1: Supervised Fine-Tuning

100 training steps reduced loss from 5.47 to 0.26 (95% reduction). The model learned the basic classification task in the first epoch.
### Phase 2: Reinforcement Learning

30 RL iterations improved mean reward from 0.73 to 0.93. The reward function combines F1 score, temporal alignment, scope correctness, and storage efficiency.
### Model Comparison

Our model excels at exact matching (60% vs 26%) because RL optimizes for getting all categories right, not just some.
### Performance by Difficulty

The 8B model dominates on easy cases (+79% F1) and matches on medium cases. The 104B model still wins on hard multi-label scenarios.
---
## Key Results
| Metric | Our Model (8B) | Cohere (104B) |
|--------|----------------|---------------|
| **Avg F1** | **0.68** | 0.61 |
| **Exact Match** | **60%** | 26% |
| Any Match | 72% | 82% |
| Model Size | 8B | 104B |
| **Improvement** | **+11.1% F1** | baseline |
### Reward Components (Final RL Iteration)
| Component | Score | Description |
|-----------|-------|-------------|
| R_F1 | 0.90 | F1 score vs gold labels |
| R_temp | 0.95 | Temporal alignment |
| R_parity | 1.00 | Company/user scope |
| R_eff | 1.00 | Storage efficiency |
---
## What It Does
The Memory Routing Agent classifies marketing conversations into 13 memory categories:
### Company Categories (Long-term business context)
| Category | Description | Persistence |
|----------|-------------|-------------|
| `company.brand_core` | Voice, values, positioning | Long (>1y) |
| `company.strategic_signatures` | Decision frameworks | Long (>1y) |
| `company.knowledge_artifacts` | Docs, style guides | Long (>1y) |
| `company.business_priorities` | Quarterly goals | Short (<3m) |
| `company.tools_config` | Integrations, APIs | Medium (~6m) |
| `company.performance_context` | Campaign metrics | Rolling (~6m) |
### User Categories (Personal preferences)
| Category | Description | Persistence |
|----------|-------------|-------------|
| `user.communication_style` | Tone, format preferences | Long (>1y) |
| `user.strategic_approach` | Personal priorities | Long (>1y) |
| `user.role_context` | Title, scope | Medium (~1y) |
| `user.workflow_patterns` | Review cadence | Medium (~1y) |
| `user.session_history` | Immediate context | Short (<2w) |
| `user.interaction_preferences` | Coaching style | Evolving |
### Special
| Category | Description |
|----------|-------------|
| `none` | Transactional or irrelevant content |
---
## Training Pipeline
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRAINING PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. SYNTHETIC DATA GENERATION β
β βββ Cohere Command-R-Plus (104B) as teacher β
β βββ 2,001 marketing conversations β
β βββ 13 category labels + persistence horizons β
β β
β 2. SUPERVISED FINE-TUNING (SFT) β
β βββ Base: meta-llama/Llama-3.1-8B β
β βββ LoRA rank 32 β
β βββ 100 steps, batch size 128 β
β βββ Cross-entropy loss β
β β
β 3. REINFORCEMENT LEARNING (RL) β
β βββ 30 iterations, 64 groups Γ 32 samples β
β βββ Importance sampling policy gradient β
β βββ Composite reward: F1 + temporal + parity + efficiency β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Reward Function
```
R_total = 0.6 Γ R_F1 + 0.2 Γ R_temp + 0.1 Γ R_parity + 0.1 Γ R_eff
```
| Component | Weight | Description |
|-----------|--------|-------------|
| R_F1 | 60% | F1 score vs gold labels |
| R_temp | 20% | Persistence horizon alignment |
| R_parity | 10% | Company/user scope correctness |
| R_eff | 10% | Storage efficiency (β€3 categories) |
---
## Quick Start
### Installation
```bash
# Clone the repository
git clone https://github.com/muratcankoylan/memory-routing-agent.git
cd memory-routing-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
```
### Environment Setup
```bash
# Create .env file with your API keys
echo "TINKER_API_KEY=your_tinker_key" >> .env
echo "COHERE_API_KEY=your_cohere_key" >> .env
echo "HF_TOKEN=your_huggingface_token" >> .env
```
### Run Inference
```python
import tinker
from tinker import types
from tinker_cookbook import renderers
from tinker_cookbook.tokenizer_utils import get_tokenizer
# Load model from Tinker checkpoint
service_client = tinker.ServiceClient()
checkpoint = "tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/rl_iter_012"
sampling_client = service_client.create_sampling_client(model_path=checkpoint)
# Setup tokenizer and renderer
tokenizer = get_tokenizer("meta-llama/Llama-3.1-8B")
renderer = renderers.get_renderer(name="llama3", tokenizer=tokenizer)
# Classify a conversation
conversation = """
USER: Our brand voice is professional but approachable. Think Harvard Business Review meets Slack.
ASSISTANT: So authoritative content with a conversational tone?
USER: Exactly. We never use jargon without explaining it first.
"""
messages = [
{"role": "system", "content": "You route marketing conversations into structured memory categories..."},
{"role": "user", "content": f"Analyze this conversation:\n\n{conversation}"}
]
prompt = renderer.build_generation_prompt(messages)
params = types.SamplingParams(max_tokens=100, temperature=0.1, stop=renderer.get_stop_sequences())
result = sampling_client.sample(prompt=prompt, sampling_params=params, num_samples=1).result()
response, _ = renderer.parse_response(result.sequences[0].tokens)
print(f"Categories: {response['content']}")
# Output: company.brand_core
```
---
## Project Structure
```
memory-routing-agent/
βββ assets/ # Training visualizations
β βββ sft_loss.png
β βββ rl_reward.png
β βββ rl_components.png
β βββ model_comparison.png
β βββ difficulty_comparison.png
βββ synthetic_data/ # Data generation pipeline
β βββ pipeline.py # Cohere-based conversation generator
β βββ run_diverse_generation.py
β βββ merged_training_dataset_2001.jsonl
βββ training/ # Training scripts
β βββ train_v2.py # Main training script (SFT + RL)
β βββ preprocess.py # Data preprocessing
β βββ rl_env.py # RL environment and reward function
β βββ final_benchmark.py # Benchmark evaluation
β βββ logs/ # Training logs (JSONL)
β βββ benchmarks/ # Benchmark results
βββ huggingface/ # HuggingFace upload scripts
βββ docs/ # Documentation
β βββ PRD.md # Product requirements
β βββ tinker_docs.md # Tinker reference
βββ MODEL_CARD.md # Model card
βββ README.md # This file
```
---
## Benchmark
The Marketing Routing Benchmark contains 50 challenging scenarios across 7 domains:
| Domain | Scenarios | Description |
|--------|-----------|-------------|
| Brand & Positioning | 8 | Brand voice, values, identity |
| Strategic Decisions | 8 | Decision frameworks, heuristics |
| Performance & Metrics | 8 | Campaign metrics, learnings |
| Tools & Integrations | 6 | Tech stack, APIs |
| User Preferences | 10 | Communication style, workflow |
| Business Priorities | 6 | Goals, focus areas |
| Knowledge Artifacts | 4 | Docs, playbooks, templates |
### Run Benchmark
```bash
python training/final_benchmark.py
```
---
## Training Your Own Model
### 1. Generate Synthetic Data
```bash
cd synthetic_data
python run_diverse_generation.py --num_items 1000
```
### 2. Preprocess Data
```bash
python training/prepare_data.py
```
### 3. Run Training
```bash
python training/train_v2.py
```
### 4. Evaluate
```bash
python training/final_benchmark.py
```
---
## Limitations
- **Multi-label**: Under-predicts when multiple categories apply
- **Overlap**: Struggles with company/user category overlap on edge cases
- **Domain**: Marketing-specific; not tested on other domains
---
## Links
- **HuggingFace Model**: [MuratcanKoylan/Marketing-Memory-Routing-8B](https://huggingface.co/MuratcanKoylan/Marketing-Memory-Routing-8B)
- **GitHub Repository**: [muratcankoylan/memory-routing-agent](https://github.com/muratcankoylan/memory-routing-agent)
- **Training Platform**: [Tinker by Thinking Machines](https://thinkingmachines.ai/)
---
## Citation
```bibtex
@misc{memory-routing-agent-2025,
title={Memory Routing Agent: Prompt Distillation for Marketing AI},
author={Muratcan Koylan},
year={2025},
howpublished={\url{https://github.com/muratcankoylan/memory-routing-agent}},
}
```
---
## License
Apache 2.0
---
## Acknowledgments
- [Thinking Machines](https://thinkingmachines.ai/) for the Tinker training platform
- [Cohere](https://cohere.com/) for Command-R-Plus teacher model
- [Meta](https://ai.meta.com/) for Llama 3.1 base model
- [Anthropic](https://anthropic.com/) for Claude, which assisted in developing this project
|