mnemotree-leaf-v1

Structured memory extractor for mnemotree — a local-first memory system for LLM agents.

Extracts structured JSON from conversation turns based on memory type (episodic, semantic, procedural). Runs in ~1s on GPU.

Model Details

Base model SmolLM2-1.7B-Instruct (1.7B params)
Task Structured JSON extraction
Training Full fine-tune with SFTTrainer (TRL)
Precision bfloat16
Max seq length 512 tokens
Size 3.2 GB

Performance

Metric Score
Schema validity 93.5%
JSON parse rate 93.5%

Per-type metrics

Type Samples Schema Valid Mean ROUGE-L
Semantic 141 90.8% 0.382
Episodic 55 100% 0.517
Procedural 4 100% 0.347

Output Schema

Given a memory type, the model outputs structured JSON:

Semantic (facts, knowledge):

{"fact": "Python uses GIL for thread safety", "subject": "Python", "confidence": 0.92}

Episodic (events, experiences):

{"event": "Alice went hiking in Yosemite", "who": "Alice", "confidence": 0.88}

Procedural (instructions, workflows):

{"procedure": "Deploy with docker compose up -d", "subject": "deployment", "confidence": 0.85}

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import json

model = AutoModelForCausalLM.from_pretrained(
    "kurcontko/mnemotree-leaf-v1", torch_dtype="bfloat16", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("kurcontko/mnemotree-leaf-v1")

system = (
    "You are a memory extraction assistant. Given a conversation turn with a "
    "type prefix (<|semantic|>, <|episodic|>, or <|procedural|>), extract "
    "structured information as a JSON object. Output ONLY valid JSON, no explanation."
)

user = "<|semantic|> Python uses the GIL for thread safety in CPython."

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": user},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(inputs, max_new_tokens=256, temperature=0.1, top_p=0.95)

result = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)
parsed = json.loads(result)
# {"fact": "Python uses GIL for thread safety in CPython", "subject": "Python", "confidence": 0.92}

With mnemotree

from mnemotree import MemoryCoreBuilder

memory = (
    MemoryCoreBuilder(store)
    .with_local_models(device="cuda")  # auto-downloads root + leaf
    .build()
)

item = await memory.remember("Python uses the GIL for thread safety")
print(item.metadata["extraction"])
# {"fact": "...", "subject": "Python", "confidence": 0.92}

Training

  • Dataset: 43,783 samples (32,368 train / 3,142 val / 8,273 test)
  • Type distribution: Semantic 31,189 / Episodic 11,163 / Procedural 1,431
  • Sources: LoCoMo, MSC, code synthesis, nameswap/paraphrase augmentations
  • Epochs: 3
  • Batch size: 64
  • Learning rate: 2e-5 (cosine schedule, 5% warmup)
  • Training time: ~18.5 hours on NVIDIA GB10
  • No conversation leakage between splits

Companion Model

Pair with mnemotree-root-v1 (ModernBERT classifier, 149M) for the full pipeline:

root (5ms) → classify type → leaf (1s) → extract structured JSON

Citation

@misc{mnemotree2025,
  title={mnemotree: Local-first memory for LLM agents},
  author={kurcontko},
  year={2025},
  url={https://github.com/kurcontko/mnemotree}
}
Downloads last month
32
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kurcontko/mnemotree-leaf-v1

Finetuned
(110)
this model