mn-context-engine-model-v3
mn-context-engine-model-v3 is the production merged context-compression model for Membrane / MirrorNeuron. It was produced by merging the v3 DPO adapter into HuggingFaceTB/SmolLM3-3B, so it can be loaded directly without a separate PEFT adapter.
Author: Homer Quan
Related runtime: https://github.com/MirrorNeuronLab/MirrorNeuron
Website: https://www.mirrorneuron.io
Intended Use
Use this model as a generative context compressor for multi-agent working memory. It is optimized for preserving executable agent state under a token budget: current task, hard constraints, latest user instructions, source references, file paths, IDs, tool errors, recovery checkpoints, decisions, and next actions.
For production Membrane deployments, use the hybrid runtime path when exact protected-fact preservation is contractual: model compression followed by deterministic cleanup, restoration, privacy redaction, and graph repair.
Benchmark Summary
Evaluation used Membrane's 100-case mock context-compression suite. Mean ratio is compressed_tokens / original_tokens, so lower is more compressed. The v2 rows are included as the previous SmolLM3 LoRA reference point.
| Method | Quality | Fact Recall | Hard Constraints | Pinned | Source Refs | Ratio | Private Leaks | Total Time |
|---|---|---|---|---|---|---|---|---|
| SmolLM3 v2 LoRA llm_only | 0.882 | 0.942 | 1.000 | 0.750 | 0.698 | 0.496 | 0 | 3053.4s |
| SmolLM3 v2 LoRA hybrid | 0.985 | 1.000 | 1.000 | 0.996 | 1.000 | 0.700 | 0 | 1.7s |
| SmolLM3 v3 DPO llm_only | 0.864 | 0.916 | 1.000 | 0.713 | 0.627 | 0.518 | 0 | 1693.3s |
| SmolLM3 v3 DPO hybrid | 0.990 | 1.000 | 1.000 | 0.998 | 1.000 | 0.751 | 0 | 1.3s |
Interpretation
llm_onlymeasures the merged model as a standalone context compressor.hybridapplies deterministic graph/fact/pin/source-ref repair after generation and is the recommended runtime contract for Membrane.- Exact source-reference and pinned-term retention should remain backed by deterministic validation, not only model behavior.
Full benchmark reports are included under benchmark/.
Loading
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "homerquan/mn-context-engine-model-v3"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
Prompt Shape
The model was trained for structured compression targets with compact operational sections such as task, constraints, latest, decisions, evidence, errors, next, refs, and warnings. Keep prompts focused on the working-memory packet to compress.
Limitations
- This is a merged model, not a LoRA adapter.
- It was evaluated on Membrane's deterministic mock-context suite; external workloads should be re-benchmarked.
- Standalone model output can miss exact pinned terms or source refs. Use deterministic repair gates when those are contractual.
- Do not rely on model-only compression for private-memory exclusion; keep redaction gates in the runtime path.
- Downloads last month
- -
Model tree for homerquan/mn-context-engine-model-v3
Base model
HuggingFaceTB/SmolLM3-3B-Base