mn-context-engine-lora-v1

mn-context-engine-lora-v1 is a PEFT LoRA adapter for google/gemma-4-E4B trained for operational context compression in Membrane-style multi-agent working memory.

The adapter is optimized for preserving executable agent state under a token budget: current task, hard constraints, latest user instructions, source references, file paths, tool errors, ACL/private boundaries, contradictions, and next actions. It is not a merged full model; load it together with the Gemma 4 E4B base model.

Intended Use

Use this adapter as a generative context compressor for agent working memory. For production use, pair the model output with deterministic cleanup, restoration, redaction, and graph repair gates. The standalone adapter beats LLMLingua on the benchmark below, but deterministic repair is still recommended for pinned/source-ref safety.

Benchmark Summary

Evaluation used Membrane's 100-case mock context compression suite. Mean ratio is compressed_tokens / original_tokens, so lower is more compressed.

Method	Quality	Critical / Fact Recall	Hard Constraints	Pinned Terms	Source Refs	Mean Ratio
LLMLingua	0.652	0.590	n/a	n/a	n/a	0.514
Gemma standalone adapter	0.943	1.000	1.000	0.838	0.823	0.484
Gemma + graph repair hybrid	0.998	1.000	1.000	0.995	1.000	0.566
Deterministic graph baseline	0.990	1.000	n/a	n/a	n/a	0.853

Interpretation

The standalone adapter outperformed LLMLingua on both quality and compression ratio in this benchmark.
The hybrid path recovered source references and most pinned terms, improving quality to 0.998, but at a higher mean ratio.
The remaining standalone weakness is exact pinned-term and source-ref recall, especially on long logs and interrupted workflows.
The recommended runtime contract is: model compression first, deterministic graph repair/restoration/redaction second.

See eval_metrics.json for the compact machine-readable metric summary.

Loading

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_id = "homerquan/mn-context-engine-lora-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)

If your environment requires explicit base access, make sure you have access to google/gemma-4-E4B and have accepted the applicable Gemma terms on Hugging Face.

Prompt Shape

The adapter was trained for structured compression targets with compact operational sections such as current task, hard constraints, latest instructions, evidence/source refs, errors/recovery state, next action, and compression warnings. Keep prompts focused on the working-memory packet to compress, not broad instruction following.

Limitations

This is a LoRA adapter, not a standalone merged model.
It was evaluated on Membrane's deterministic mock-context suite; external workloads should be re-benchmarked.
Standalone output can miss exact pinned terms or source refs. Use deterministic repair gates when those are contractual.
Do not rely on model-only compression for private-memory exclusion; keep redaction gates in the runtime path.

Training Snapshot

Base model: google/gemma-4-E4B
Adapter type: LoRA / PEFT
Task: context compression for executable agent working memory
Benchmark date: 2026-05-10

Downloads last month: 13

Model tree for homerquan/mn-context-engine-lora-v1

Base model

google/gemma-4-E4B

Adapter

(5)

this model