mn-context-engine-lora-v2
mn-context-engine-lora-v2 is a PEFT LoRA adapter for HuggingFaceTB/SmolLM3-3B trained for operational context compression in Membrane-style multi-agent working memory.
The adapter is optimized for preserving executable agent state under a token budget: current task, hard constraints, latest user instructions, source references, file paths, tool errors, ACL/private boundaries, contradictions, recovery state, and next actions. It is not a merged full model; load it together with the SmolLM3 base model.
Intended Use
Use this adapter as a generative context compressor for agent working memory. For production use, pair the model output with deterministic cleanup, restoration, redaction, and graph repair gates. The standalone adapter is evaluated below against LLMLingua on the same packet-parity benchmark, and the hybrid path shows the expected runtime contract: model compression followed by deterministic graph repair.
Benchmark Summary
Evaluation used Membrane's 100-case mock context compression suite. Mean ratio is compressed_tokens / original_tokens, so lower is more compressed.
| Method | Quality | Critical / Fact Recall | Hard Constraints | Pinned Terms | Source Refs | Mean Ratio | Private Leaks |
|---|---|---|---|---|---|---|---|
| LLMLingua | 0.000 | 0.895 | 1.000 | 0.580 | 0.420 | 0.485 | 100 |
| SmolLM3 standalone adapter | 0.882 | 0.942 | 1.000 | 0.750 | 0.698 | 0.496 | 0 |
| SmolLM3 + graph repair hybrid | 0.985 | 1.000 | 1.000 | 0.996 | 1.000 | 0.700 | 0 |
Interpretation
- The standalone adapter measures the model-only context compressor path.
- The hybrid path applies deterministic graph/fact/pin/source-ref repair after generation and is the recommended runtime mode for Membrane.
- LLMLingua is included as a non-fine-tuned compression baseline using the same rendered packet inputs and deterministic scoring metrics.
- Exact pinned-term and source-ref preservation should remain a deterministic contract, not only a model behavior.
See eval_metrics.json for the compact machine-readable metric summary and benchmark/ for full benchmark reports.
Loading
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model_id = "homerquan/mn-context-engine-lora-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
Prompt Shape
The adapter was trained for structured compression targets with compact operational sections such as current task, hard constraints, latest instructions, evidence/source refs, errors/recovery state, next action, and compression warnings. Keep prompts focused on the working-memory packet to compress, not broad instruction following.
Limitations
- This is a LoRA adapter, not a standalone merged model.
- It was evaluated on Membrane's deterministic mock-context suite; external workloads should be re-benchmarked.
- Standalone output can miss exact pinned terms or source refs. Use deterministic repair gates when those are contractual.
- Do not rely on model-only compression for private-memory exclusion; keep redaction gates in the runtime path.
Training Snapshot
- Base model:
HuggingFaceTB/SmolLM3-3B - Adapter type: LoRA / PEFT
- Task: context compression for executable agent working memory
- Benchmark date: 2026-05-10
- Downloads last month
- -