--- base_model: google/gemma-4-E4B-it library_name: peft pipeline_tag: text-generation license: apache-2.0 tags: - gemma - unsloth - lora - peft - tool-calling - memory-agent - german - synthetic-data language: - de - en --- # Gemma 4 E4B Memory Policy LoRA Experimental LoRA adapter for `google/gemma-4-E4B-it`, fine-tuned to act as the policy layer of a German-first memory agent. The goal is not to store real memories inside the model weights. The adapter teaches the model to decide when to call memory tools, when to answer directly, and when to ask for clarification. Model on Hugging Face: ```text classicgrey/gemma4-e4b-memory-policy-lora ``` ## What This Adapter Is For This adapter is intended for assistant systems that use external memory storage such as a database, vector store, or local memory service. It learns routing behavior for situations like: - save stable user preferences, project facts, routines, tasks, and decisions - avoid saving transient states such as "I am tired right now" - retrieve durable memory when a question depends on stored context - recall recent conversation when the user says "vorhin", "eben", or "gerade" - update or delete memory only after a known memory ID exists - ask a clarification question when the memory reference is ambiguous - keep scene or vehicle actions separate from memory tools ## Expected Output Schema The assistant target is a single JSON object: ```json { "action": "tool_call", "tool_calls": [ { "name": "saveMemory", "arguments": { "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.", "category": "personal" } } ], "response": "", "policy_reason": "stabile wiederverwendbare Nutzerpräferenz" } ``` Supported actions: ```text tool_call final_response ask_clarification ``` Core memory tools: ```text saveMemory retrieveMemory recallRecentConversation updateMemory deleteMemory ``` ## Example Behavior User: ```text Merk dir bitte, dass ich Codebeispiele immer mit Tests sehen möchte. ``` Desired assistant output: ```json { "action": "tool_call", "tool_calls": [ { "name": "saveMemory", "arguments": { "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.", "category": "personal" } } ], "response": "", "policy_reason": "stabile wiederverwendbare Nutzerpräferenz" } ``` User: ```text Ich bin gerade ziemlich müde, aber lass uns kurz weitermachen. ``` Desired assistant output: ```json { "action": "final_response", "tool_calls": [], "response": "Klar, wir machen ruhig und fokussiert weiter.", "policy_reason": "momentaner Zustand ohne Speicherwunsch" } ``` ## Dataset The training data is synthetic and intentionally generic. It does not contain private real user memories. Current merged dataset: ```text total examples: 1,766 train examples: 1,501 validation examples: 265 ``` Category distribution: ```text save_memory: 334 no_memory_write: 356 retrieve_memory: 235 recall_recent: 185 update_memory_lookup: 134 update_memory_known_id: 109 delete_memory_lookup: 137 delete_memory_known_id: 87 clarification: 101 direct_response: 55 scene_tool: 26 other seed categories: 7 ``` The dataset was built from a small hand-written seed set plus synthetic expansion with DeepSeek. Generated candidates were validated, filtered for schema issues, deduplicated by user utterance, and checked for obvious PII-like patterns before merging. Local dataset files: ```text dataset/combined_train.jsonl dataset/combined_validation.jsonl dataset/combined_manifest.json ``` ## Training Training was done with Unsloth on Google Colab using a T4 GPU. Base model: ```text google/gemma-4-E4B-it ``` Adapter: ```text classicgrey/gemma4-e4b-memory-policy-lora ``` Training shape: ```text method: QLoRA / LoRA SFT max sequence length: 1024 steps: 300 batch size: 1 gradient accumulation: 4 LoRA rank: 16 LoRA alpha: 16 ``` Colab notebook: ```text notebooks/gemma_memory_unsloth_colab.ipynb ``` ## Loading With Unsloth ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="classicgrey/gemma4-e4b-memory-policy-lora", max_seq_length=1024, load_in_4bit=True, ) FastLanguageModel.for_inference(model) ``` You still need access to the base model `google/gemma-4-E4B-it` on Hugging Face. ## Local Reproduction Validate the generated dataset: ```bash python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl ``` Run the local Unsloth training script: ```bash uv run scripts/train_unsloth_sft.py \ --model google/gemma-4-E4B-it \ --train-file dataset/combined_train.jsonl \ --validation-file dataset/combined_validation.jsonl \ --output-dir outputs/gemma4-e4b-memory-policy-lora \ --max-steps 300 \ --batch-size 1 \ --grad-accum 4 \ --max-seq-length 1024 ``` Generate more synthetic candidates: ```bash uv run scripts/generate_deepseek_dataset.py \ --model deepseek-v4-pro \ --target-total 1500 \ --batch-size 20 \ --workers 5 \ --seed-sample-size 8 \ --max-tokens 4096 \ --out dataset/generated_candidates.jsonl ``` Filter and merge candidates: ```bash python3 scripts/filter_generated_dataset.py python3 scripts/inspect_generated_dataset.py dataset/generated_candidates.filtered.jsonl python3 scripts/merge_dataset.py python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl ``` ## Known Limitations This is a first experimental adapter. It has learned the broad routing behavior, but schema adherence still needs improvement. In early manual tests, the model correctly chose `saveMemory`, but sometimes used non-target argument names such as `memory` or `memory_id` instead of the desired `content` and `category`. For production use, add: - stricter schema-only examples - adversarial negative examples - a held-out evaluation set - JSON schema validation after generation - automatic repair or rejection of malformed tool arguments - more multi-turn traces after tool results ## Safety Notes Do not use this adapter as the memory store. Use it only as a planner/router. Actual user memories should live in an external system with explicit retention, deletion, privacy, and audit controls. Do not add private real user memories to public training data. ## Project Files ```text scripts/build_dataset.py seed dataset builder scripts/generate_deepseek_dataset.py synthetic expansion generator scripts/inspect_generated_dataset.py quality inspector scripts/filter_generated_dataset.py candidate filter scripts/merge_dataset.py train/validation merger scripts/train_unsloth_sft.py Unsloth SFT script notebooks/gemma_memory_unsloth_colab.ipynb ```