mn-context-engine-lora-v2

mn-context-engine-lora-v2 is a PEFT LoRA adapter for HuggingFaceTB/SmolLM3-3B trained for operational context compression in Membrane-style multi-agent working memory.

The adapter is optimized for preserving executable agent state under a token budget: current task, hard constraints, latest user instructions, source references, file paths, tool errors, ACL/private boundaries, contradictions, recovery state, and next actions. It is not a merged full model; load it together with the SmolLM3 base model.

Intended Use

Use this adapter as a generative context compressor for agent working memory. For production use, pair the model output with deterministic cleanup, restoration, redaction, and graph repair gates. The standalone adapter is evaluated below against LLMLingua on the same packet-parity benchmark, and the hybrid path shows the expected runtime contract: model compression followed by deterministic graph repair.

Benchmark Summary

Evaluation used Membrane's 100-case mock context compression suite. Mean ratio is compressed_tokens / original_tokens, so lower is more compressed.

Method	Quality	Critical / Fact Recall	Hard Constraints	Pinned Terms	Source Refs	Mean Ratio	Private Leaks
LLMLingua	0.000	0.895	1.000	0.580	0.420	0.485	100
SmolLM3 standalone adapter	0.882	0.942	1.000	0.750	0.698	0.496	0
SmolLM3 + graph repair hybrid	0.985	1.000	1.000	0.996	1.000	0.700	0

Interpretation

The standalone adapter measures the model-only context compressor path.
The hybrid path applies deterministic graph/fact/pin/source-ref repair after generation and is the recommended runtime mode for Membrane.
LLMLingua is included as a non-fine-tuned compression baseline using the same rendered packet inputs and deterministic scoring metrics.
Exact pinned-term and source-ref preservation should remain a deterministic contract, not only a model behavior.

See eval_metrics.json for the compact machine-readable metric summary and benchmark/ for full benchmark reports.

Loading

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_id = "homerquan/mn-context-engine-lora-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)

Prompt Shape

The adapter was trained for structured compression targets with compact operational sections such as current task, hard constraints, latest instructions, evidence/source refs, errors/recovery state, next action, and compression warnings. Keep prompts focused on the working-memory packet to compress, not broad instruction following.

Limitations

This is a LoRA adapter, not a standalone merged model.
It was evaluated on Membrane's deterministic mock-context suite; external workloads should be re-benchmarked.
Standalone output can miss exact pinned terms or source refs. Use deterministic repair gates when those are contractual.
Do not rely on model-only compression for private-memory exclusion; keep redaction gates in the runtime path.

Training Snapshot

Base model: HuggingFaceTB/SmolLM3-3B
Adapter type: LoRA / PEFT
Task: context compression for executable agent working memory
Benchmark date: 2026-05-10

Downloads last month: -

Model tree for homerquan/mn-context-engine-lora-v2

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

HuggingFaceTB/SmolLM3-3B

Adapter

(34)

this model