Text Generation
PEFT
Safetensors
German
English
gemma
unsloth
lora
tool-calling
memory-agent
german
synthetic-data
conversational
Instructions to use classicgrey/gemma4-e4b-memory-policy-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use classicgrey/gemma4-e4b-memory-policy-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e4b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "classicgrey/gemma4-e4b-memory-policy-lora") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use classicgrey/gemma4-e4b-memory-policy-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for classicgrey/gemma4-e4b-memory-policy-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for classicgrey/gemma4-e4b-memory-policy-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for classicgrey/gemma4-e4b-memory-policy-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="classicgrey/gemma4-e4b-memory-policy-lora", max_seq_length=2048, )
| base_model: google/gemma-4-E4B-it | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| license: apache-2.0 | |
| tags: | |
| - gemma | |
| - unsloth | |
| - lora | |
| - peft | |
| - tool-calling | |
| - memory-agent | |
| - german | |
| - synthetic-data | |
| language: | |
| - de | |
| - en | |
| # Gemma 4 E4B Memory Policy LoRA | |
| Experimental LoRA adapter for `google/gemma-4-E4B-it`, fine-tuned to act as the | |
| policy layer of a German-first memory agent. | |
| The goal is not to store real memories inside the model weights. The adapter | |
| teaches the model to decide when to call memory tools, when to answer directly, | |
| and when to ask for clarification. | |
| Model on Hugging Face: | |
| ```text | |
| classicgrey/gemma4-e4b-memory-policy-lora | |
| ``` | |
| ## What This Adapter Is For | |
| This adapter is intended for assistant systems that use external memory storage | |
| such as a database, vector store, or local memory service. It learns routing | |
| behavior for situations like: | |
| - save stable user preferences, project facts, routines, tasks, and decisions | |
| - avoid saving transient states such as "I am tired right now" | |
| - retrieve durable memory when a question depends on stored context | |
| - recall recent conversation when the user says "vorhin", "eben", or "gerade" | |
| - update or delete memory only after a known memory ID exists | |
| - ask a clarification question when the memory reference is ambiguous | |
| - keep scene or vehicle actions separate from memory tools | |
| ## Expected Output Schema | |
| The assistant target is a single JSON object: | |
| ```json | |
| { | |
| "action": "tool_call", | |
| "tool_calls": [ | |
| { | |
| "name": "saveMemory", | |
| "arguments": { | |
| "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.", | |
| "category": "personal" | |
| } | |
| } | |
| ], | |
| "response": "", | |
| "policy_reason": "stabile wiederverwendbare Nutzerpräferenz" | |
| } | |
| ``` | |
| Supported actions: | |
| ```text | |
| tool_call | |
| final_response | |
| ask_clarification | |
| ``` | |
| Core memory tools: | |
| ```text | |
| saveMemory | |
| retrieveMemory | |
| recallRecentConversation | |
| updateMemory | |
| deleteMemory | |
| ``` | |
| ## Example Behavior | |
| User: | |
| ```text | |
| Merk dir bitte, dass ich Codebeispiele immer mit Tests sehen möchte. | |
| ``` | |
| Desired assistant output: | |
| ```json | |
| { | |
| "action": "tool_call", | |
| "tool_calls": [ | |
| { | |
| "name": "saveMemory", | |
| "arguments": { | |
| "content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.", | |
| "category": "personal" | |
| } | |
| } | |
| ], | |
| "response": "", | |
| "policy_reason": "stabile wiederverwendbare Nutzerpräferenz" | |
| } | |
| ``` | |
| User: | |
| ```text | |
| Ich bin gerade ziemlich müde, aber lass uns kurz weitermachen. | |
| ``` | |
| Desired assistant output: | |
| ```json | |
| { | |
| "action": "final_response", | |
| "tool_calls": [], | |
| "response": "Klar, wir machen ruhig und fokussiert weiter.", | |
| "policy_reason": "momentaner Zustand ohne Speicherwunsch" | |
| } | |
| ``` | |
| ## Dataset | |
| The training data is synthetic and intentionally generic. It does not contain | |
| private real user memories. | |
| Current merged dataset: | |
| ```text | |
| total examples: 1,766 | |
| train examples: 1,501 | |
| validation examples: 265 | |
| ``` | |
| Category distribution: | |
| ```text | |
| save_memory: 334 | |
| no_memory_write: 356 | |
| retrieve_memory: 235 | |
| recall_recent: 185 | |
| update_memory_lookup: 134 | |
| update_memory_known_id: 109 | |
| delete_memory_lookup: 137 | |
| delete_memory_known_id: 87 | |
| clarification: 101 | |
| direct_response: 55 | |
| scene_tool: 26 | |
| other seed categories: 7 | |
| ``` | |
| The dataset was built from a small hand-written seed set plus synthetic | |
| expansion with DeepSeek. Generated candidates were validated, filtered for | |
| schema issues, deduplicated by user utterance, and checked for obvious PII-like | |
| patterns before merging. | |
| Local dataset files: | |
| ```text | |
| dataset/combined_train.jsonl | |
| dataset/combined_validation.jsonl | |
| dataset/combined_manifest.json | |
| ``` | |
| ## Training | |
| Training was done with Unsloth on Google Colab using a T4 GPU. | |
| Base model: | |
| ```text | |
| google/gemma-4-E4B-it | |
| ``` | |
| Adapter: | |
| ```text | |
| classicgrey/gemma4-e4b-memory-policy-lora | |
| ``` | |
| Training shape: | |
| ```text | |
| method: QLoRA / LoRA SFT | |
| max sequence length: 1024 | |
| steps: 300 | |
| batch size: 1 | |
| gradient accumulation: 4 | |
| LoRA rank: 16 | |
| LoRA alpha: 16 | |
| ``` | |
| Colab notebook: | |
| ```text | |
| notebooks/gemma_memory_unsloth_colab.ipynb | |
| ``` | |
| ## Loading With Unsloth | |
| ```python | |
| from unsloth import FastLanguageModel | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| model_name="classicgrey/gemma4-e4b-memory-policy-lora", | |
| max_seq_length=1024, | |
| load_in_4bit=True, | |
| ) | |
| FastLanguageModel.for_inference(model) | |
| ``` | |
| You still need access to the base model `google/gemma-4-E4B-it` on Hugging Face. | |
| ## Local Reproduction | |
| Validate the generated dataset: | |
| ```bash | |
| python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl | |
| ``` | |
| Run the local Unsloth training script: | |
| ```bash | |
| uv run scripts/train_unsloth_sft.py \ | |
| --model google/gemma-4-E4B-it \ | |
| --train-file dataset/combined_train.jsonl \ | |
| --validation-file dataset/combined_validation.jsonl \ | |
| --output-dir outputs/gemma4-e4b-memory-policy-lora \ | |
| --max-steps 300 \ | |
| --batch-size 1 \ | |
| --grad-accum 4 \ | |
| --max-seq-length 1024 | |
| ``` | |
| Generate more synthetic candidates: | |
| ```bash | |
| uv run scripts/generate_deepseek_dataset.py \ | |
| --model deepseek-v4-pro \ | |
| --target-total 1500 \ | |
| --batch-size 20 \ | |
| --workers 5 \ | |
| --seed-sample-size 8 \ | |
| --max-tokens 4096 \ | |
| --out dataset/generated_candidates.jsonl | |
| ``` | |
| Filter and merge candidates: | |
| ```bash | |
| python3 scripts/filter_generated_dataset.py | |
| python3 scripts/inspect_generated_dataset.py dataset/generated_candidates.filtered.jsonl | |
| python3 scripts/merge_dataset.py | |
| python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl | |
| ``` | |
| ## Known Limitations | |
| This is a first experimental adapter. It has learned the broad routing behavior, | |
| but schema adherence still needs improvement. In early manual tests, the model | |
| correctly chose `saveMemory`, but sometimes used non-target argument names such | |
| as `memory` or `memory_id` instead of the desired `content` and `category`. | |
| For production use, add: | |
| - stricter schema-only examples | |
| - adversarial negative examples | |
| - a held-out evaluation set | |
| - JSON schema validation after generation | |
| - automatic repair or rejection of malformed tool arguments | |
| - more multi-turn traces after tool results | |
| ## Safety Notes | |
| Do not use this adapter as the memory store. Use it only as a planner/router. | |
| Actual user memories should live in an external system with explicit retention, | |
| deletion, privacy, and audit controls. | |
| Do not add private real user memories to public training data. | |
| ## Project Files | |
| ```text | |
| scripts/build_dataset.py seed dataset builder | |
| scripts/generate_deepseek_dataset.py synthetic expansion generator | |
| scripts/inspect_generated_dataset.py quality inspector | |
| scripts/filter_generated_dataset.py candidate filter | |
| scripts/merge_dataset.py train/validation merger | |
| scripts/train_unsloth_sft.py Unsloth SFT script | |
| notebooks/gemma_memory_unsloth_colab.ipynb | |
| ``` | |