mnemotree-leaf-v1
Structured memory extractor for mnemotree — a local-first memory system for LLM agents.
Extracts structured JSON from conversation turns based on memory type (episodic, semantic, procedural). Runs in ~1s on GPU.
Model Details
| Base model | SmolLM2-1.7B-Instruct (1.7B params) |
| Task | Structured JSON extraction |
| Training | Full fine-tune with SFTTrainer (TRL) |
| Precision | bfloat16 |
| Max seq length | 512 tokens |
| Size | 3.2 GB |
Performance
| Metric | Score |
|---|---|
| Schema validity | 93.5% |
| JSON parse rate | 93.5% |
Per-type metrics
| Type | Samples | Schema Valid | Mean ROUGE-L |
|---|---|---|---|
| Semantic | 141 | 90.8% | 0.382 |
| Episodic | 55 | 100% | 0.517 |
| Procedural | 4 | 100% | 0.347 |
Output Schema
Given a memory type, the model outputs structured JSON:
Semantic (facts, knowledge):
{"fact": "Python uses GIL for thread safety", "subject": "Python", "confidence": 0.92}
Episodic (events, experiences):
{"event": "Alice went hiking in Yosemite", "who": "Alice", "confidence": 0.88}
Procedural (instructions, workflows):
{"procedure": "Deploy with docker compose up -d", "subject": "deployment", "confidence": 0.85}
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import json
model = AutoModelForCausalLM.from_pretrained(
"kurcontko/mnemotree-leaf-v1", torch_dtype="bfloat16", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("kurcontko/mnemotree-leaf-v1")
system = (
"You are a memory extraction assistant. Given a conversation turn with a "
"type prefix (<|semantic|>, <|episodic|>, or <|procedural|>), extract "
"structured information as a JSON object. Output ONLY valid JSON, no explanation."
)
user = "<|semantic|> Python uses the GIL for thread safety in CPython."
messages = [
{"role": "system", "content": system},
{"role": "user", "content": user},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(inputs, max_new_tokens=256, temperature=0.1, top_p=0.95)
result = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)
parsed = json.loads(result)
# {"fact": "Python uses GIL for thread safety in CPython", "subject": "Python", "confidence": 0.92}
With mnemotree
from mnemotree import MemoryCoreBuilder
memory = (
MemoryCoreBuilder(store)
.with_local_models(device="cuda") # auto-downloads root + leaf
.build()
)
item = await memory.remember("Python uses the GIL for thread safety")
print(item.metadata["extraction"])
# {"fact": "...", "subject": "Python", "confidence": 0.92}
Training
- Dataset: 43,783 samples (32,368 train / 3,142 val / 8,273 test)
- Type distribution: Semantic 31,189 / Episodic 11,163 / Procedural 1,431
- Sources: LoCoMo, MSC, code synthesis, nameswap/paraphrase augmentations
- Epochs: 3
- Batch size: 64
- Learning rate: 2e-5 (cosine schedule, 5% warmup)
- Training time: ~18.5 hours on NVIDIA GB10
- No conversation leakage between splits
Companion Model
Pair with mnemotree-root-v1 (ModernBERT classifier, 149M) for the full pipeline:
root (5ms) → classify type → leaf (1s) → extract structured JSON
Citation
@misc{mnemotree2025,
title={mnemotree: Local-first memory for LLM agents},
author={kurcontko},
year={2025},
url={https://github.com/kurcontko/mnemotree}
}
- Downloads last month
- 32
Model tree for kurcontko/mnemotree-leaf-v1
Base model
HuggingFaceTB/SmolLM2-1.7B
Quantized
HuggingFaceTB/SmolLM2-1.7B-Instruct