Instructions to use homerquan/mn-context-engine-model-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use homerquan/mn-context-engine-model-v3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="homerquan/mn-context-engine-model-v3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("homerquan/mn-context-engine-model-v3")
model = AutoModelForCausalLM.from_pretrained("homerquan/mn-context-engine-model-v3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use homerquan/mn-context-engine-model-v3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "homerquan/mn-context-engine-model-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "homerquan/mn-context-engine-model-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/homerquan/mn-context-engine-model-v3

SGLang

How to use homerquan/mn-context-engine-model-v3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "homerquan/mn-context-engine-model-v3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "homerquan/mn-context-engine-model-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "homerquan/mn-context-engine-model-v3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "homerquan/mn-context-engine-model-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use homerquan/mn-context-engine-model-v3 with Docker Model Runner:
```
docker model run hf.co/homerquan/mn-context-engine-model-v3
```

mn-context-engine-model-v3

mn-context-engine-model-v3 is the production merged context-compression model for Membrane / MirrorNeuron. It was produced by merging the v3 DPO adapter into HuggingFaceTB/SmolLM3-3B, so it can be loaded directly without a separate PEFT adapter.

Author: Homer Quan

Website: https://www.mirrorneuron.io

Intended Use

Use this model as a generative context compressor for multi-agent working memory. It is optimized for preserving executable agent state under a token budget: current task, hard constraints, latest user instructions, source references, file paths, IDs, tool errors, recovery checkpoints, decisions, and next actions.

For production Membrane deployments, use the hybrid runtime path when exact protected-fact preservation is contractual: model compression followed by deterministic cleanup, restoration, privacy redaction, and graph repair.

Benchmark Summary

Evaluation used Membrane's 100-case mock context-compression suite. Mean ratio is compressed_tokens / original_tokens, so lower is more compressed. The v2 rows are included as the previous SmolLM3 LoRA reference point.

Method	Quality	Fact Recall	Hard Constraints	Pinned	Source Refs	Ratio	Total Time
SmolLM3 v2 LoRA llm_only	0.882	0.942	1.000	0.750	0.698	0.496	3053.4s
SmolLM3 v2 LoRA hybrid	0.985	1.000	1.000	0.996	1.000	0.700	1.7s
SmolLM3 v3 DPO llm_only	0.864	0.916	1.000	0.713	0.627	0.518	1693.3s
SmolLM3 v3 DPO hybrid	0.990	1.000	1.000	0.998	1.000	0.751	1.3s

Interpretation

llm_only measures the merged model as a standalone context compressor.
hybrid applies deterministic graph/fact/pin/source-ref repair after generation and is the recommended runtime contract for Membrane.
Exact source-reference and pinned-term retention should remain backed by deterministic validation, not only model behavior.

Full benchmark reports are included under benchmark/.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "homerquan/mn-context-engine-model-v3"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)

Prompt Shape

The model was trained for structured compression targets with compact operational sections such as task, constraints, latest, decisions, evidence, errors, next, refs, and warnings. Keep prompts focused on the working-memory packet to compress.

Limitations

This is a merged model, not a LoRA adapter.
It was evaluated on Membrane's deterministic mock-context suite; external workloads should be re-benchmarked.
Standalone model output can miss exact pinned terms or source refs. Use deterministic repair gates when those are contractual.
Do not rely on model-only compression for private-memory exclusion; keep redaction gates in the runtime path.

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for homerquan/mn-context-engine-model-v3

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

HuggingFaceTB/SmolLM3-3B

Finetuned

(134)

this model