File size: 12,959 Bytes
e4c940d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
685d968
 
192d8d2
685d968
192d8d2
 
 
685d968
192d8d2
685d968
192d8d2
 
 
 
 
685d968
192d8d2
 
 
685d968
192d8d2
685d968
192d8d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
685d968
 
 
192d8d2
685d968
192d8d2
685d968
 
 
192d8d2
685d968
192d8d2
685d968
192d8d2
685d968
192d8d2
685d968
192d8d2
685d968
 
 
192d8d2
685d968
192d8d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
685d968
 
 
192d8d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
685d968
192d8d2
685d968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192d8d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
685d968
 
 
 
 
 
 
 
 
 
 
 
 
 
192d8d2
685d968
 
 
 
 
192d8d2
685d968
 
 
 
 
 
 
 
 
 
 
 
 
192d8d2
685d968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192d8d2
 
685d968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192d8d2
 
685d968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192d8d2
 
685d968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192d8d2
685d968
 
 
 
192d8d2
685d968
 
192d8d2
 
 
 
 
 
 
 
 
 
685d968
 
 
 
 
 
 
 
 
 
 
192d8d2
 
685d968
 
 
 
192d8d2
 
685d968
 
192d8d2
685d968
 
192d8d2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
---
license: apache-2.0
language:
- en
tags:
- memory-routing
- marketing
- classification
- llama
- lora
- tinker
- prompt-distillation
base_model: meta-llama/Llama-3.1-8B
metrics:
- f1
- accuracy
pipeline_tag: text-classification
---

# Memory Routing Agent

**A specialized 8B model that outperforms its 104B teacher on marketing conversation classification.**

[![HuggingFace](https://img.shields.io/badge/πŸ€—%20Model-Marketing--Memory--Routing--8B-blue)](https://huggingface.co/MuratcanKoylan/Marketing-Memory-Routing-8B)
[![GitHub](https://img.shields.io/badge/GitHub-memory--routing--agent-black)](https://github.com/muratcankoylan/memory-routing-agent)
[![License](https://img.shields.io/badge/License-Apache%202.0-green)](LICENSE)

---

## The Experiment

This project demonstrates **prompt distillation**: training a small, specialized model to outperform the large model that generated its training data.

### The Challenge

Marketing AI assistants need to remember the right information from conversations. Not everything is worth storing - you need to distinguish between:
- **Valuable**: "Our brand voice is professional but approachable" β†’ Store in long-term memory
- **Transactional**: "What time is the meeting tomorrow?" β†’ Don't store

This is a **13-category classification problem** with nuanced distinctions between company-level and user-level information, different persistence horizons, and the critical ability to say "none" for irrelevant content.

### The Approach

1. **Generate synthetic data** using Cohere Command-R-Plus (104B) as the teacher
2. **Fine-tune Llama-3.1-8B** with LoRA using Tinker's training platform
3. **Apply reinforcement learning** with a custom reward function
4. **Benchmark against the teacher** on challenging, held-out scenarios

### The Result

| Model | Parameters | Avg F1 | Exact Match |
|-------|------------|--------|-------------|
| **Ours** | **8B** | **0.68** | **60%** |
| Cohere Command-R-Plus | 104B | 0.61 | 26% |

**Our 8B model achieves 11.1% higher F1 and 2.3x better exact match accuracy than the 104B teacher, while being 13x smaller.**

The student surpassed the teacher through:
- **Focused training**: The model only learns this one task, not general capabilities
- **RL refinement**: The reward function optimizes for exact category matching, not just plausible outputs
- **Clean data**: Synthetic data with consistent labeling, no noise from human annotation disagreements

---

## Training Visualizations

### Phase 1: Supervised Fine-Tuning

![SFT Loss](assets/sft_loss.png)

100 training steps reduced loss from 5.47 to 0.26 (95% reduction). The model learned the basic classification task in the first epoch.

### Phase 2: Reinforcement Learning

![RL Reward](assets/rl_reward.png)

30 RL iterations improved mean reward from 0.73 to 0.93. The reward function combines F1 score, temporal alignment, scope correctness, and storage efficiency.

### Model Comparison

![Model Comparison](assets/model_comparison.png)

Our model excels at exact matching (60% vs 26%) because RL optimizes for getting all categories right, not just some.

### Performance by Difficulty

![Difficulty Comparison](assets/difficulty_comparison.png)

The 8B model dominates on easy cases (+79% F1) and matches on medium cases. The 104B model still wins on hard multi-label scenarios.

---

## Key Results

| Metric | Our Model (8B) | Cohere (104B) |
|--------|----------------|---------------|
| **Avg F1** | **0.68** | 0.61 |
| **Exact Match** | **60%** | 26% |
| Any Match | 72% | 82% |
| Model Size | 8B | 104B |
| **Improvement** | **+11.1% F1** | baseline |

### Reward Components (Final RL Iteration)

| Component | Score | Description |
|-----------|-------|-------------|
| R_F1 | 0.90 | F1 score vs gold labels |
| R_temp | 0.95 | Temporal alignment |
| R_parity | 1.00 | Company/user scope |
| R_eff | 1.00 | Storage efficiency |

---

## What It Does

The Memory Routing Agent classifies marketing conversations into 13 memory categories:

### Company Categories (Long-term business context)
| Category | Description | Persistence |
|----------|-------------|-------------|
| `company.brand_core` | Voice, values, positioning | Long (>1y) |
| `company.strategic_signatures` | Decision frameworks | Long (>1y) |
| `company.knowledge_artifacts` | Docs, style guides | Long (>1y) |
| `company.business_priorities` | Quarterly goals | Short (<3m) |
| `company.tools_config` | Integrations, APIs | Medium (~6m) |
| `company.performance_context` | Campaign metrics | Rolling (~6m) |

### User Categories (Personal preferences)
| Category | Description | Persistence |
|----------|-------------|-------------|
| `user.communication_style` | Tone, format preferences | Long (>1y) |
| `user.strategic_approach` | Personal priorities | Long (>1y) |
| `user.role_context` | Title, scope | Medium (~1y) |
| `user.workflow_patterns` | Review cadence | Medium (~1y) |
| `user.session_history` | Immediate context | Short (<2w) |
| `user.interaction_preferences` | Coaching style | Evolving |

### Special
| Category | Description |
|----------|-------------|
| `none` | Transactional or irrelevant content |

---

## Training Pipeline

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TRAINING PIPELINE                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  1. SYNTHETIC DATA GENERATION                                   β”‚
β”‚     β”œβ”€β”€ Cohere Command-R-Plus (104B) as teacher                β”‚
β”‚     β”œβ”€β”€ 2,001 marketing conversations                          β”‚
β”‚     └── 13 category labels + persistence horizons              β”‚
β”‚                                                                 β”‚
β”‚  2. SUPERVISED FINE-TUNING (SFT)                               β”‚
β”‚     β”œβ”€β”€ Base: meta-llama/Llama-3.1-8B                          β”‚
β”‚     β”œβ”€β”€ LoRA rank 32                                           β”‚
β”‚     β”œβ”€β”€ 100 steps, batch size 128                              β”‚
β”‚     └── Cross-entropy loss                                     β”‚
β”‚                                                                 β”‚
β”‚  3. REINFORCEMENT LEARNING (RL)                                β”‚
β”‚     β”œβ”€β”€ 30 iterations, 64 groups Γ— 32 samples                  β”‚
β”‚     β”œβ”€β”€ Importance sampling policy gradient                    β”‚
β”‚     └── Composite reward: F1 + temporal + parity + efficiency  β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Reward Function

```
R_total = 0.6 Γ— R_F1 + 0.2 Γ— R_temp + 0.1 Γ— R_parity + 0.1 Γ— R_eff
```

| Component | Weight | Description |
|-----------|--------|-------------|
| R_F1 | 60% | F1 score vs gold labels |
| R_temp | 20% | Persistence horizon alignment |
| R_parity | 10% | Company/user scope correctness |
| R_eff | 10% | Storage efficiency (≀3 categories) |

---

## Quick Start

### Installation

```bash
# Clone the repository
git clone https://github.com/muratcankoylan/memory-routing-agent.git
cd memory-routing-agent

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt
```

### Environment Setup

```bash
# Create .env file with your API keys
echo "TINKER_API_KEY=your_tinker_key" >> .env
echo "COHERE_API_KEY=your_cohere_key" >> .env
echo "HF_TOKEN=your_huggingface_token" >> .env
```

### Run Inference

```python
import tinker
from tinker import types
from tinker_cookbook import renderers
from tinker_cookbook.tokenizer_utils import get_tokenizer

# Load model from Tinker checkpoint
service_client = tinker.ServiceClient()
checkpoint = "tinker://4f4bae1f-5a95-5f53-a55a-a14f2872825c:train:0/sampler_weights/rl_iter_012"
sampling_client = service_client.create_sampling_client(model_path=checkpoint)

# Setup tokenizer and renderer
tokenizer = get_tokenizer("meta-llama/Llama-3.1-8B")
renderer = renderers.get_renderer(name="llama3", tokenizer=tokenizer)

# Classify a conversation
conversation = """
USER: Our brand voice is professional but approachable. Think Harvard Business Review meets Slack.
ASSISTANT: So authoritative content with a conversational tone?
USER: Exactly. We never use jargon without explaining it first.
"""

messages = [
    {"role": "system", "content": "You route marketing conversations into structured memory categories..."},
    {"role": "user", "content": f"Analyze this conversation:\n\n{conversation}"}
]

prompt = renderer.build_generation_prompt(messages)
params = types.SamplingParams(max_tokens=100, temperature=0.1, stop=renderer.get_stop_sequences())
result = sampling_client.sample(prompt=prompt, sampling_params=params, num_samples=1).result()

response, _ = renderer.parse_response(result.sequences[0].tokens)
print(f"Categories: {response['content']}")
# Output: company.brand_core
```

---

## Project Structure

```
memory-routing-agent/
β”œβ”€β”€ assets/                   # Training visualizations
β”‚   β”œβ”€β”€ sft_loss.png
β”‚   β”œβ”€β”€ rl_reward.png
β”‚   β”œβ”€β”€ rl_components.png
β”‚   β”œβ”€β”€ model_comparison.png
β”‚   └── difficulty_comparison.png
β”œβ”€β”€ synthetic_data/           # Data generation pipeline
β”‚   β”œβ”€β”€ pipeline.py           # Cohere-based conversation generator
β”‚   β”œβ”€β”€ run_diverse_generation.py
β”‚   └── merged_training_dataset_2001.jsonl
β”œβ”€β”€ training/                 # Training scripts
β”‚   β”œβ”€β”€ train_v2.py           # Main training script (SFT + RL)
β”‚   β”œβ”€β”€ preprocess.py         # Data preprocessing
β”‚   β”œβ”€β”€ rl_env.py             # RL environment and reward function
β”‚   β”œβ”€β”€ final_benchmark.py    # Benchmark evaluation
β”‚   β”œβ”€β”€ logs/                 # Training logs (JSONL)
β”‚   └── benchmarks/           # Benchmark results
β”œβ”€β”€ huggingface/              # HuggingFace upload scripts
β”œβ”€β”€ docs/                     # Documentation
β”‚   β”œβ”€β”€ PRD.md                # Product requirements
β”‚   └── tinker_docs.md        # Tinker reference
β”œβ”€β”€ MODEL_CARD.md             # Model card
└── README.md                 # This file
```

---

## Benchmark

The Marketing Routing Benchmark contains 50 challenging scenarios across 7 domains:

| Domain | Scenarios | Description |
|--------|-----------|-------------|
| Brand & Positioning | 8 | Brand voice, values, identity |
| Strategic Decisions | 8 | Decision frameworks, heuristics |
| Performance & Metrics | 8 | Campaign metrics, learnings |
| Tools & Integrations | 6 | Tech stack, APIs |
| User Preferences | 10 | Communication style, workflow |
| Business Priorities | 6 | Goals, focus areas |
| Knowledge Artifacts | 4 | Docs, playbooks, templates |

### Run Benchmark

```bash
python training/final_benchmark.py
```

---

## Training Your Own Model

### 1. Generate Synthetic Data

```bash
cd synthetic_data
python run_diverse_generation.py --num_items 1000
```

### 2. Preprocess Data

```bash
python training/prepare_data.py
```

### 3. Run Training

```bash
python training/train_v2.py
```

### 4. Evaluate

```bash
python training/final_benchmark.py
```

---

## Limitations

- **Multi-label**: Under-predicts when multiple categories apply
- **Overlap**: Struggles with company/user category overlap on edge cases
- **Domain**: Marketing-specific; not tested on other domains

---

## Links

- **HuggingFace Model**: [MuratcanKoylan/Marketing-Memory-Routing-8B](https://huggingface.co/MuratcanKoylan/Marketing-Memory-Routing-8B)
- **GitHub Repository**: [muratcankoylan/memory-routing-agent](https://github.com/muratcankoylan/memory-routing-agent)
- **Training Platform**: [Tinker by Thinking Machines](https://thinkingmachines.ai/)

---

## Citation

```bibtex
@misc{memory-routing-agent-2025,
  title={Memory Routing Agent: Prompt Distillation for Marketing AI},
  author={Muratcan Koylan},
  year={2025},
  howpublished={\url{https://github.com/muratcankoylan/memory-routing-agent}},
}
```

---

## License

Apache 2.0

---

## Acknowledgments

- [Thinking Machines](https://thinkingmachines.ai/) for the Tinker training platform
- [Cohere](https://cohere.com/) for Command-R-Plus teacher model
- [Meta](https://ai.meta.com/) for Llama 3.1 base model
- [Anthropic](https://anthropic.com/) for Claude, which assisted in developing this project