Update model card README

57d40ed verified 24 days ago

6.96 kB

	---
	base_model: google/gemma-4-E4B-it
	library_name: peft
	pipeline_tag: text-generation
	license: apache-2.0
	tags:
	- gemma
	- unsloth
	- lora
	- peft
	- tool-calling
	- memory-agent
	- german
	- synthetic-data
	language:
	- de
	- en
	---

	# Gemma 4 E4B Memory Policy LoRA

	Experimental LoRA adapter for `google/gemma-4-E4B-it`, fine-tuned to act as the
	policy layer of a German-first memory agent.

	The goal is not to store real memories inside the model weights. The adapter
	teaches the model to decide when to call memory tools, when to answer directly,
	and when to ask for clarification.

	Model on Hugging Face:

	```text
	classicgrey/gemma4-e4b-memory-policy-lora
	```

	## What This Adapter Is For

	This adapter is intended for assistant systems that use external memory storage
	such as a database, vector store, or local memory service. It learns routing
	behavior for situations like:

	- save stable user preferences, project facts, routines, tasks, and decisions
	- avoid saving transient states such as "I am tired right now"
	- retrieve durable memory when a question depends on stored context
	- recall recent conversation when the user says "vorhin", "eben", or "gerade"
	- update or delete memory only after a known memory ID exists
	- ask a clarification question when the memory reference is ambiguous
	- keep scene or vehicle actions separate from memory tools

	## Expected Output Schema

	The assistant target is a single JSON object:

	```json
	{
	"action": "tool_call",
	"tool_calls": [
	{
	"name": "saveMemory",
	"arguments": {
	"content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
	"category": "personal"
	}
	}
	],
	"response": "",
	"policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
	}
	```

	Supported actions:

	```text
	tool_call
	final_response
	ask_clarification
	```

	Core memory tools:

	```text
	saveMemory
	retrieveMemory
	recallRecentConversation
	updateMemory
	deleteMemory
	```

	## Example Behavior

	User:

	```text
	Merk dir bitte, dass ich Codebeispiele immer mit Tests sehen möchte.
	```

	Desired assistant output:

	```json
	{
	"action": "tool_call",
	"tool_calls": [
	{
	"name": "saveMemory",
	"arguments": {
	"content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
	"category": "personal"
	}
	}
	],
	"response": "",
	"policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
	}
	```

	User:

	```text
	Ich bin gerade ziemlich müde, aber lass uns kurz weitermachen.
	```

	Desired assistant output:

	```json
	{
	"action": "final_response",
	"tool_calls": [],
	"response": "Klar, wir machen ruhig und fokussiert weiter.",
	"policy_reason": "momentaner Zustand ohne Speicherwunsch"
	}
	```

	## Dataset

	The training data is synthetic and intentionally generic. It does not contain
	private real user memories.

	Current merged dataset:

	```text
	total examples: 1,766
	train examples: 1,501
	validation examples: 265
	```

	Category distribution:

	```text
	save_memory: 334
	no_memory_write: 356
	retrieve_memory: 235
	recall_recent: 185
	update_memory_lookup: 134
	update_memory_known_id: 109
	delete_memory_lookup: 137
	delete_memory_known_id: 87
	clarification: 101
	direct_response: 55
	scene_tool: 26
	other seed categories: 7
	```

	The dataset was built from a small hand-written seed set plus synthetic
	expansion with DeepSeek. Generated candidates were validated, filtered for
	schema issues, deduplicated by user utterance, and checked for obvious PII-like
	patterns before merging.

	Local dataset files:

	```text
	dataset/combined_train.jsonl
	dataset/combined_validation.jsonl
	dataset/combined_manifest.json
	```

	## Training

	Training was done with Unsloth on Google Colab using a T4 GPU.

	Base model:

	```text
	google/gemma-4-E4B-it
	```

	Adapter:

	```text
	classicgrey/gemma4-e4b-memory-policy-lora
	```

	Training shape:

	```text
	method: QLoRA / LoRA SFT
	max sequence length: 1024
	steps: 300
	batch size: 1
	gradient accumulation: 4
	LoRA rank: 16
	LoRA alpha: 16
	```

	Colab notebook:

	```text
	notebooks/gemma_memory_unsloth_colab.ipynb
	```

	## Loading With Unsloth

	```python
	from unsloth import FastLanguageModel

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name="classicgrey/gemma4-e4b-memory-policy-lora",
	max_seq_length=1024,
	load_in_4bit=True,
	)

	FastLanguageModel.for_inference(model)
	```

	You still need access to the base model `google/gemma-4-E4B-it` on Hugging Face.

	## Local Reproduction

	Validate the generated dataset:

	```bash
	python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl
	```

	Run the local Unsloth training script:

	```bash
	uv run scripts/train_unsloth_sft.py \
	--model google/gemma-4-E4B-it \
	--train-file dataset/combined_train.jsonl \
	--validation-file dataset/combined_validation.jsonl \
	--output-dir outputs/gemma4-e4b-memory-policy-lora \
	--max-steps 300 \
	--batch-size 1 \
	--grad-accum 4 \
	--max-seq-length 1024
	```

	Generate more synthetic candidates:

	```bash
	uv run scripts/generate_deepseek_dataset.py \
	--model deepseek-v4-pro \
	--target-total 1500 \
	--batch-size 20 \
	--workers 5 \
	--seed-sample-size 8 \
	--max-tokens 4096 \
	--out dataset/generated_candidates.jsonl
	```

	Filter and merge candidates:

	```bash
	python3 scripts/filter_generated_dataset.py
	python3 scripts/inspect_generated_dataset.py dataset/generated_candidates.filtered.jsonl
	python3 scripts/merge_dataset.py
	python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl
	```

	## Known Limitations

	This is a first experimental adapter. It has learned the broad routing behavior,
	but schema adherence still needs improvement. In early manual tests, the model
	correctly chose `saveMemory`, but sometimes used non-target argument names such
	as `memory` or `memory_id` instead of the desired `content` and `category`.

	For production use, add:

	- stricter schema-only examples
	- adversarial negative examples
	- a held-out evaluation set
	- JSON schema validation after generation
	- automatic repair or rejection of malformed tool arguments
	- more multi-turn traces after tool results

	## Safety Notes

	Do not use this adapter as the memory store. Use it only as a planner/router.
	Actual user memories should live in an external system with explicit retention,
	deletion, privacy, and audit controls.

	Do not add private real user memories to public training data.

	## Project Files

	```text
	scripts/build_dataset.py seed dataset builder
	scripts/generate_deepseek_dataset.py synthetic expansion generator
	scripts/inspect_generated_dataset.py quality inspector
	scripts/filter_generated_dataset.py candidate filter
	scripts/merge_dataset.py train/validation merger
	scripts/train_unsloth_sft.py Unsloth SFT script
	notebooks/gemma_memory_unsloth_colab.ipynb
	```