Instructions to use kurcontko/mnemotree-leaf-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kurcontko/mnemotree-leaf-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kurcontko/mnemotree-leaf-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kurcontko/mnemotree-leaf-v1")
model = AutoModelForCausalLM.from_pretrained("kurcontko/mnemotree-leaf-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kurcontko/mnemotree-leaf-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kurcontko/mnemotree-leaf-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kurcontko/mnemotree-leaf-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kurcontko/mnemotree-leaf-v1

SGLang

How to use kurcontko/mnemotree-leaf-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kurcontko/mnemotree-leaf-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kurcontko/mnemotree-leaf-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kurcontko/mnemotree-leaf-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kurcontko/mnemotree-leaf-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use kurcontko/mnemotree-leaf-v1 with Docker Model Runner:
```
docker model run hf.co/kurcontko/mnemotree-leaf-v1
```

mnemotree-leaf-v1

Structured memory extractor for mnemotree — a local-first memory system for LLM agents.

Extracts structured JSON from conversation turns based on memory type (episodic, semantic, procedural). Runs in ~1s on GPU.

Model Details


Base model	SmolLM2-1.7B-Instruct (1.7B params)
Task	Structured JSON extraction
Training	Full fine-tune with SFTTrainer (TRL)
Precision	bfloat16
Max seq length	512 tokens
Size	3.2 GB

Performance

Metric	Score
Schema validity	93.5%
JSON parse rate	93.5%

Per-type metrics

Type	Samples	Schema Valid	Mean ROUGE-L
Semantic	141	90.8%	0.382
Episodic	55	100%	0.517
Procedural	4	100%	0.347

Output Schema

Given a memory type, the model outputs structured JSON:

Semantic (facts, knowledge):

{"fact": "Python uses GIL for thread safety", "subject": "Python", "confidence": 0.92}

Episodic (events, experiences):

{"event": "Alice went hiking in Yosemite", "who": "Alice", "confidence": 0.88}

Procedural (instructions, workflows):

{"procedure": "Deploy with docker compose up -d", "subject": "deployment", "confidence": 0.85}

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import json

model = AutoModelForCausalLM.from_pretrained(
    "kurcontko/mnemotree-leaf-v1", torch_dtype="bfloat16", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("kurcontko/mnemotree-leaf-v1")

system = (
    "You are a memory extraction assistant. Given a conversation turn with a "
    "type prefix (<|semantic|>, <|episodic|>, or <|procedural|>), extract "
    "structured information as a JSON object. Output ONLY valid JSON, no explanation."
)

user = "<|semantic|> Python uses the GIL for thread safety in CPython."

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": user},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(inputs, max_new_tokens=256, temperature=0.1, top_p=0.95)

result = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)
parsed = json.loads(result)
# {"fact": "Python uses GIL for thread safety in CPython", "subject": "Python", "confidence": 0.92}

With mnemotree

from mnemotree import MemoryCoreBuilder

memory = (
    MemoryCoreBuilder(store)
    .with_local_models(device="cuda")  # auto-downloads root + leaf
    .build()
)

item = await memory.remember("Python uses the GIL for thread safety")
print(item.metadata["extraction"])
# {"fact": "...", "subject": "Python", "confidence": 0.92}

Training

Dataset: 43,783 samples (32,368 train / 3,142 val / 8,273 test)
Type distribution: Semantic 31,189 / Episodic 11,163 / Procedural 1,431
Sources: LoCoMo, MSC, code synthesis, nameswap/paraphrase augmentations
Epochs: 3
Batch size: 64
Learning rate: 2e-5 (cosine schedule, 5% warmup)
Training time: ~18.5 hours on NVIDIA GB10
No conversation leakage between splits

Companion Model

Pair with mnemotree-root-v1 (ModernBERT classifier, 149M) for the full pipeline:

root (5ms) → classify type → leaf (1s) → extract structured JSON

Citation

@misc{mnemotree2025,
  title={mnemotree: Local-first memory for LLM agents},
  author={kurcontko},
  year={2025},
  url={https://github.com/kurcontko/mnemotree}
}

Downloads last month: 5

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for kurcontko/mnemotree-leaf-v1

Base model

HuggingFaceTB/SmolLM2-1.7B

Quantized

HuggingFaceTB/SmolLM2-1.7B-Instruct

Finetuned

(150)

this model