mn-context-engine-model-v3

mn-context-engine-model-v3 is the production merged context-compression model for Membrane / MirrorNeuron. It was produced by merging the v3 DPO adapter into HuggingFaceTB/SmolLM3-3B, so it can be loaded directly without a separate PEFT adapter.

Author: Homer Quan

Related runtime: https://github.com/MirrorNeuronLab/MirrorNeuron

Website: https://www.mirrorneuron.io

Intended Use

Use this model as a generative context compressor for multi-agent working memory. It is optimized for preserving executable agent state under a token budget: current task, hard constraints, latest user instructions, source references, file paths, IDs, tool errors, recovery checkpoints, decisions, and next actions.

For production Membrane deployments, use the hybrid runtime path when exact protected-fact preservation is contractual: model compression followed by deterministic cleanup, restoration, privacy redaction, and graph repair.

Benchmark Summary

Evaluation used Membrane's 100-case mock context-compression suite. Mean ratio is compressed_tokens / original_tokens, so lower is more compressed. The v2 rows are included as the previous SmolLM3 LoRA reference point.

Method Quality Fact Recall Hard Constraints Pinned Source Refs Ratio Private Leaks Total Time
SmolLM3 v2 LoRA llm_only 0.882 0.942 1.000 0.750 0.698 0.496 0 3053.4s
SmolLM3 v2 LoRA hybrid 0.985 1.000 1.000 0.996 1.000 0.700 0 1.7s
SmolLM3 v3 DPO llm_only 0.864 0.916 1.000 0.713 0.627 0.518 0 1693.3s
SmolLM3 v3 DPO hybrid 0.990 1.000 1.000 0.998 1.000 0.751 0 1.3s

Interpretation

  • llm_only measures the merged model as a standalone context compressor.
  • hybrid applies deterministic graph/fact/pin/source-ref repair after generation and is the recommended runtime contract for Membrane.
  • Exact source-reference and pinned-term retention should remain backed by deterministic validation, not only model behavior.

Full benchmark reports are included under benchmark/.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "homerquan/mn-context-engine-model-v3"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)

Prompt Shape

The model was trained for structured compression targets with compact operational sections such as task, constraints, latest, decisions, evidence, errors, next, refs, and warnings. Keep prompts focused on the working-memory packet to compress.

Limitations

  • This is a merged model, not a LoRA adapter.
  • It was evaluated on Membrane's deterministic mock-context suite; external workloads should be re-benchmarked.
  • Standalone model output can miss exact pinned terms or source refs. Use deterministic repair gates when those are contractual.
  • Do not rely on model-only compression for private-memory exclusion; keep redaction gates in the runtime path.
Downloads last month
-
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for homerquan/mn-context-engine-model-v3

Finetuned
(134)
this model