mn-context-engine-lora-v1
mn-context-engine-lora-v1 is a PEFT LoRA adapter for google/gemma-4-E4B trained for operational context compression in Membrane-style multi-agent working memory.
The adapter is optimized for preserving executable agent state under a token budget: current task, hard constraints, latest user instructions, source references, file paths, tool errors, ACL/private boundaries, contradictions, and next actions. It is not a merged full model; load it together with the Gemma 4 E4B base model.
Intended Use
Use this adapter as a generative context compressor for agent working memory. For production use, pair the model output with deterministic cleanup, restoration, redaction, and graph repair gates. The standalone adapter beats LLMLingua on the benchmark below, but deterministic repair is still recommended for pinned/source-ref safety.
Benchmark Summary
Evaluation used Membrane's 100-case mock context compression suite. Mean ratio is compressed_tokens / original_tokens, so lower is more compressed.
| Method | Quality | Critical / Fact Recall | Hard Constraints | Pinned Terms | Source Refs | Mean Ratio | Private Leaks |
|---|---|---|---|---|---|---|---|
| LLMLingua | 0.652 | 0.590 | n/a | n/a | n/a | 0.514 | 0 |
| Gemma standalone adapter | 0.943 | 1.000 | 1.000 | 0.838 | 0.823 | 0.484 | 0 |
| Gemma + graph repair hybrid | 0.998 | 1.000 | 1.000 | 0.995 | 1.000 | 0.566 | 0 |
| Deterministic graph baseline | 0.990 | 1.000 | n/a | n/a | n/a | 0.853 | 0 |
Interpretation
- The standalone adapter outperformed LLMLingua on both quality and compression ratio in this benchmark.
- The hybrid path recovered source references and most pinned terms, improving quality to 0.998, but at a higher mean ratio.
- The remaining standalone weakness is exact pinned-term and source-ref recall, especially on long logs and interrupted workflows.
- The recommended runtime contract is: model compression first, deterministic graph repair/restoration/redaction second.
See eval_metrics.json for the compact machine-readable metric summary.
Loading
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model_id = "homerquan/mn-context-engine-lora-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
If your environment requires explicit base access, make sure you have access to google/gemma-4-E4B and have accepted the applicable Gemma terms on Hugging Face.
Prompt Shape
The adapter was trained for structured compression targets with compact operational sections such as current task, hard constraints, latest instructions, evidence/source refs, errors/recovery state, next action, and compression warnings. Keep prompts focused on the working-memory packet to compress, not broad instruction following.
Limitations
- This is a LoRA adapter, not a standalone merged model.
- It was evaluated on Membrane's deterministic mock-context suite; external workloads should be re-benchmarked.
- Standalone output can miss exact pinned terms or source refs. Use deterministic repair gates when those are contractual.
- Do not rely on model-only compression for private-memory exclusion; keep redaction gates in the runtime path.
Training Snapshot
- Base model:
google/gemma-4-E4B - Adapter type: LoRA / PEFT
- Task: context compression for executable agent working memory
- Benchmark date: 2026-05-10
- Downloads last month
- 13
Model tree for homerquan/mn-context-engine-lora-v1
Base model
google/gemma-4-E4B