Gemma 4 E4B Memory Policy LoRA

Experimental LoRA adapter for google/gemma-4-E4B-it, fine-tuned to act as the policy layer of a German-first memory agent.

The goal is not to store real memories inside the model weights. The adapter teaches the model to decide when to call memory tools, when to answer directly, and when to ask for clarification.

Model on Hugging Face:

classicgrey/gemma4-e4b-memory-policy-lora

What This Adapter Is For

This adapter is intended for assistant systems that use external memory storage such as a database, vector store, or local memory service. It learns routing behavior for situations like:

  • save stable user preferences, project facts, routines, tasks, and decisions
  • avoid saving transient states such as "I am tired right now"
  • retrieve durable memory when a question depends on stored context
  • recall recent conversation when the user says "vorhin", "eben", or "gerade"
  • update or delete memory only after a known memory ID exists
  • ask a clarification question when the memory reference is ambiguous
  • keep scene or vehicle actions separate from memory tools

Expected Output Schema

The assistant target is a single JSON object:

{
  "action": "tool_call",
  "tool_calls": [
    {
      "name": "saveMemory",
      "arguments": {
        "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
        "category": "personal"
      }
    }
  ],
  "response": "",
  "policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
}

Supported actions:

tool_call
final_response
ask_clarification

Core memory tools:

saveMemory
retrieveMemory
recallRecentConversation
updateMemory
deleteMemory

Example Behavior

User:

Merk dir bitte, dass ich Codebeispiele immer mit Tests sehen möchte.

Desired assistant output:

{
  "action": "tool_call",
  "tool_calls": [
    {
      "name": "saveMemory",
      "arguments": {
        "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
        "category": "personal"
      }
    }
  ],
  "response": "",
  "policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
}

User:

Ich bin gerade ziemlich müde, aber lass uns kurz weitermachen.

Desired assistant output:

{
  "action": "final_response",
  "tool_calls": [],
  "response": "Klar, wir machen ruhig und fokussiert weiter.",
  "policy_reason": "momentaner Zustand ohne Speicherwunsch"
}

Dataset

The training data is synthetic and intentionally generic. It does not contain private real user memories.

Current merged dataset:

total examples:      1,766
train examples:      1,501
validation examples:   265

Category distribution:

save_memory:             334
no_memory_write:         356
retrieve_memory:         235
recall_recent:           185
update_memory_lookup:    134
update_memory_known_id:  109
delete_memory_lookup:    137
delete_memory_known_id:   87
clarification:           101
direct_response:          55
scene_tool:               26
other seed categories:     7

The dataset was built from a small hand-written seed set plus synthetic expansion with DeepSeek. Generated candidates were validated, filtered for schema issues, deduplicated by user utterance, and checked for obvious PII-like patterns before merging.

Local dataset files:

dataset/combined_train.jsonl
dataset/combined_validation.jsonl
dataset/combined_manifest.json

Training

Training was done with Unsloth on Google Colab using a T4 GPU.

Base model:

google/gemma-4-E4B-it

Adapter:

classicgrey/gemma4-e4b-memory-policy-lora

Training shape:

method: QLoRA / LoRA SFT
max sequence length: 1024
steps: 300
batch size: 1
gradient accumulation: 4
LoRA rank: 16
LoRA alpha: 16

Colab notebook:

notebooks/gemma_memory_unsloth_colab.ipynb

Loading With Unsloth

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="classicgrey/gemma4-e4b-memory-policy-lora",
    max_seq_length=1024,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

You still need access to the base model google/gemma-4-E4B-it on Hugging Face.

Local Reproduction

Validate the generated dataset:

python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl

Run the local Unsloth training script:

uv run scripts/train_unsloth_sft.py \
  --model google/gemma-4-E4B-it \
  --train-file dataset/combined_train.jsonl \
  --validation-file dataset/combined_validation.jsonl \
  --output-dir outputs/gemma4-e4b-memory-policy-lora \
  --max-steps 300 \
  --batch-size 1 \
  --grad-accum 4 \
  --max-seq-length 1024

Generate more synthetic candidates:

uv run scripts/generate_deepseek_dataset.py \
  --model deepseek-v4-pro \
  --target-total 1500 \
  --batch-size 20 \
  --workers 5 \
  --seed-sample-size 8 \
  --max-tokens 4096 \
  --out dataset/generated_candidates.jsonl

Filter and merge candidates:

python3 scripts/filter_generated_dataset.py
python3 scripts/inspect_generated_dataset.py dataset/generated_candidates.filtered.jsonl
python3 scripts/merge_dataset.py
python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl

Known Limitations

This is a first experimental adapter. It has learned the broad routing behavior, but schema adherence still needs improvement. In early manual tests, the model correctly chose saveMemory, but sometimes used non-target argument names such as memory or memory_id instead of the desired content and category.

For production use, add:

  • stricter schema-only examples
  • adversarial negative examples
  • a held-out evaluation set
  • JSON schema validation after generation
  • automatic repair or rejection of malformed tool arguments
  • more multi-turn traces after tool results

Safety Notes

Do not use this adapter as the memory store. Use it only as a planner/router. Actual user memories should live in an external system with explicit retention, deletion, privacy, and audit controls.

Do not add private real user memories to public training data.

Project Files

scripts/build_dataset.py                 seed dataset builder
scripts/generate_deepseek_dataset.py     synthetic expansion generator
scripts/inspect_generated_dataset.py     quality inspector
scripts/filter_generated_dataset.py      candidate filter
scripts/merge_dataset.py                 train/validation merger
scripts/train_unsloth_sft.py             Unsloth SFT script
notebooks/gemma_memory_unsloth_colab.ipynb
Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for classicgrey/gemma4-e4b-memory-policy-lora

Adapter
(107)
this model