mnemotree-root-v1
Fast memory type classifier for mnemotree — a local-first memory system for LLM agents.
Classifies conversation turns into memory types: episodic, semantic, or procedural. Runs in ~5ms on GPU, ~15ms on CPU.
Model Details
| Base model | ModernBERT-base (149M params) |
| Task | Multi-label sequence classification (3 labels) |
| Training | Full fine-tune, Focal Loss (gamma=2.0) |
| Precision | bfloat16 |
| Max seq length | 256 tokens |
| Size | 289 MB |
Performance
| Metric | Score |
|---|---|
| Macro F1 | 0.674 |
| Micro F1 | 0.703 |
| Optimal threshold | 0.55 |
Per-class metrics (threshold=0.55)
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Episodic | 0.627 | 0.668 | 0.647 |
| Semantic | 0.810 | 0.690 | 0.745 |
| Procedural | 0.552 | 0.737 | 0.631 |
Label Mapping
0 → episodic (events, experiences, conversations)
1 → semantic (facts, knowledge, definitions)
2 → procedural (how-to, workflows, instructions)
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("kurcontko/mnemotree-root-v1")
tokenizer = AutoTokenizer.from_pretrained("kurcontko/mnemotree-root-v1")
text = "Python uses the GIL for thread safety in CPython."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits)[0]
labels = ["episodic", "semantic", "procedural"]
for label, prob in zip(labels, probs):
print(f"{label}: {prob:.3f}")
# episodic: 0.12, semantic: 0.94, procedural: 0.08
With mnemotree
from mnemotree import MemoryCoreBuilder
memory = (
MemoryCoreBuilder(store)
.with_local_models(device="cuda") # auto-downloads root + leaf
.build()
)
await memory.remember("Python uses the GIL for thread safety")
# → classified as semantic, importance=0.94
Training
- Dataset: 73,781 samples (49,706 train / 10,024 val / 14,051 test)
- Sources: LoCoMo, MSC, code synthesis, nameswap/paraphrase augmentations
- Epochs: 5
- Batch size: 64
- Learning rate: 2e-5 (cosine schedule, 10% warmup)
- Loss: Binary Cross-Entropy with Focal Loss (gamma=2.0)
- No conversation leakage between splits
Companion Model
Pair with mnemotree-leaf-v1 (SmolLM2-1.7B extractor) for structured memory extraction.
root (5ms) → classify type → leaf (1s) → extract structured JSON
Citation
@misc{mnemotree2025,
title={mnemotree: Local-first memory for LLM agents},
author={kurcontko},
year={2025},
url={https://github.com/kurcontko/mnemotree}
}
- Downloads last month
- 30
Model tree for kurcontko/mnemotree-root-v1
Base model
answerdotai/ModernBERT-base