---
base_model: google/gemma-4-E4B-it
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
tags:
  - gemma
  - unsloth
  - lora
  - peft
  - tool-calling
  - memory-agent
  - german
  - synthetic-data
language:
  - de
  - en
---

# Gemma 4 E4B Memory Policy LoRA

Experimental LoRA adapter for `google/gemma-4-E4B-it`, fine-tuned to act as the
policy layer of a German-first memory agent.

The goal is not to store real memories inside the model weights. The adapter
teaches the model to decide when to call memory tools, when to answer directly,
and when to ask for clarification.

Model on Hugging Face:

```text
classicgrey/gemma4-e4b-memory-policy-lora
```

## What This Adapter Is For

This adapter is intended for assistant systems that use external memory storage
such as a database, vector store, or local memory service. It learns routing
behavior for situations like:

- save stable user preferences, project facts, routines, tasks, and decisions
- avoid saving transient states such as "I am tired right now"
- retrieve durable memory when a question depends on stored context
- recall recent conversation when the user says "vorhin", "eben", or "gerade"
- update or delete memory only after a known memory ID exists
- ask a clarification question when the memory reference is ambiguous
- keep scene or vehicle actions separate from memory tools

## Expected Output Schema

The assistant target is a single JSON object:

```json
{
  "action": "tool_call",
  "tool_calls": [
    {
      "name": "saveMemory",
      "arguments": {
        "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
        "category": "personal"
      }
    }
  ],
  "response": "",
  "policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
}
```

Supported actions:

```text
tool_call
final_response
ask_clarification
```

Core memory tools:

```text
saveMemory
retrieveMemory
recallRecentConversation
updateMemory
deleteMemory
```

## Example Behavior

User:

```text
Merk dir bitte, dass ich Codebeispiele immer mit Tests sehen möchte.
```

Desired assistant output:

```json
{
  "action": "tool_call",
  "tool_calls": [
    {
      "name": "saveMemory",
      "arguments": {
        "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
        "category": "personal"
      }
    }
  ],
  "response": "",
  "policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
}
```

User:

```text
Ich bin gerade ziemlich müde, aber lass uns kurz weitermachen.
```

Desired assistant output:

```json
{
  "action": "final_response",
  "tool_calls": [],
  "response": "Klar, wir machen ruhig und fokussiert weiter.",
  "policy_reason": "momentaner Zustand ohne Speicherwunsch"
}
```

## Dataset

The training data is synthetic and intentionally generic. It does not contain
private real user memories.

Current merged dataset:

```text
total examples:      1,766
train examples:      1,501
validation examples:   265
```

Category distribution:

```text
save_memory:             334
no_memory_write:         356
retrieve_memory:         235
recall_recent:           185
update_memory_lookup:    134
update_memory_known_id:  109
delete_memory_lookup:    137
delete_memory_known_id:   87
clarification:           101
direct_response:          55
scene_tool:               26
other seed categories:     7
```

The dataset was built from a small hand-written seed set plus synthetic
expansion with DeepSeek. Generated candidates were validated, filtered for
schema issues, deduplicated by user utterance, and checked for obvious PII-like
patterns before merging.

Local dataset files:

```text
dataset/combined_train.jsonl
dataset/combined_validation.jsonl
dataset/combined_manifest.json
```

## Training

Training was done with Unsloth on Google Colab using a T4 GPU.

Base model:

```text
google/gemma-4-E4B-it
```

Adapter:

```text
classicgrey/gemma4-e4b-memory-policy-lora
```

Training shape:

```text
method: QLoRA / LoRA SFT
max sequence length: 1024
steps: 300
batch size: 1
gradient accumulation: 4
LoRA rank: 16
LoRA alpha: 16
```

Colab notebook:

```text
notebooks/gemma_memory_unsloth_colab.ipynb
```

## Loading With Unsloth

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="classicgrey/gemma4-e4b-memory-policy-lora",
    max_seq_length=1024,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)
```

You still need access to the base model `google/gemma-4-E4B-it` on Hugging Face.

## Local Reproduction

Validate the generated dataset:

```bash
python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl
```

Run the local Unsloth training script:

```bash
uv run scripts/train_unsloth_sft.py \
  --model google/gemma-4-E4B-it \
  --train-file dataset/combined_train.jsonl \
  --validation-file dataset/combined_validation.jsonl \
  --output-dir outputs/gemma4-e4b-memory-policy-lora \
  --max-steps 300 \
  --batch-size 1 \
  --grad-accum 4 \
  --max-seq-length 1024
```

Generate more synthetic candidates:

```bash
uv run scripts/generate_deepseek_dataset.py \
  --model deepseek-v4-pro \
  --target-total 1500 \
  --batch-size 20 \
  --workers 5 \
  --seed-sample-size 8 \
  --max-tokens 4096 \
  --out dataset/generated_candidates.jsonl
```

Filter and merge candidates:

```bash
python3 scripts/filter_generated_dataset.py
python3 scripts/inspect_generated_dataset.py dataset/generated_candidates.filtered.jsonl
python3 scripts/merge_dataset.py
python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl
```

## Known Limitations

This is a first experimental adapter. It has learned the broad routing behavior,
but schema adherence still needs improvement. In early manual tests, the model
correctly chose `saveMemory`, but sometimes used non-target argument names such
as `memory` or `memory_id` instead of the desired `content` and `category`.

For production use, add:

- stricter schema-only examples
- adversarial negative examples
- a held-out evaluation set
- JSON schema validation after generation
- automatic repair or rejection of malformed tool arguments
- more multi-turn traces after tool results

## Safety Notes

Do not use this adapter as the memory store. Use it only as a planner/router.
Actual user memories should live in an external system with explicit retention,
deletion, privacy, and audit controls.

Do not add private real user memories to public training data.

## Project Files

```text
scripts/build_dataset.py                 seed dataset builder
scripts/generate_deepseek_dataset.py     synthetic expansion generator
scripts/inspect_generated_dataset.py     quality inspector
scripts/filter_generated_dataset.py      candidate filter
scripts/merge_dataset.py                 train/validation merger
scripts/train_unsloth_sft.py             Unsloth SFT script
notebooks/gemma_memory_unsloth_colab.ipynb
```