Instructions to use kameronk/clariso-gemma4-e4b-v9-peft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use kameronk/clariso-gemma4-e4b-v9-peft with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it") model = PeftModel.from_pretrained(base_model, "kameronk/clariso-gemma4-e4b-v9-peft") - Notebooks
- Google Colab
- Kaggle
Clariso · Gemma 4 E4B · v9 mixed LoRA adapter
Rank-8 LoRA adapter that re-styles google/gemma-4-E4B-it answers into the
plain-language
convention. Intended audience: adults with cognitive impairments, second-
language readers, anyone who benefits from short sentences, common words,
and a lede-first structure.
The adapter is designed to be loaded with a gated generation strategy
(base_thought_lora_answer): keep the LoRA off during the base model's
<|channel>thought ... <channel|> reasoning span, then flip it on for
the answer. This preserves Gemma 4's full reasoning capability and applies
the plain-language compression only to the final output.
Demo
Try it in a browser: Clariso Space
Quick usage
import torch
from transformers import AutoModelForImageTextToText, AutoTokenizer
from peft import PeftModel
base_id = "google/gemma-4-E4B-it"
adapter_id = "kameronk/clariso-gemma4-e4b-v9-peft"
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForImageTextToText.from_pretrained(
base_id, torch_dtype=torch.bfloat16, device_map="cuda"
)
model = PeftModel.from_pretrained(base, adapter_id)
model.train(False)
# The trained system prompt (keep verbatim — the LoRA was conditioned on this).
system = (
"You write accessible answers for adults with cognitive impairments. Rules:\n"
"- One idea per sentence.\n"
"- Short sentences.\n"
"- Simple, common words.\n"
"- Put the main answer first.\n"
"- Use bullet points for lists.\n"
"- Keep the answer brief.\n"
"- Be concrete.\n"
"- Be reassuring without talking down to the reader."
)
user = "What should I do if my child has a fever?"
prompt = tokenizer.apply_chat_template(
[{"role": "system", "content": system}, {"role": "user", "content": user}],
tokenize=False, add_generation_prompt=True, enable_thinking=True,
)
# Gated generation: adapter OFF during thinking, ON for the answer.
# `<channel|>` is the marker that ends the reasoning span and begins the answer.
channel_id = tokenizer(["<channel|>"], add_special_tokens=False).input_ids[0][0]
end_id = tokenizer(["<turn|>"], add_special_tokens=False).input_ids[0][0]
model.disable_adapter_layers()
ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
past, cur, out_ids, flipped = None, ids, [], False
with torch.no_grad():
for _ in range(400):
out = model(input_ids=cur, past_key_values=past, use_cache=True)
past = out.past_key_values
logits = out.logits[:, -1, :] / 1.0 # temp = 1.0 (Gemma 4 default)
nxt = torch.multinomial(torch.softmax(logits, dim=-1), 1)
nid = int(nxt.item())
out_ids.append(nid)
if nid == channel_id and not flipped:
model.enable_adapter_layers()
flipped = True
if nid == end_id:
break
cur = nxt
print(tokenizer.decode(out_ids, skip_special_tokens=False))
Recommended sampling
The base Gemma 4 sampling spec from Google: temperature=1.0, top_p=0.95, top_k=64.
Running this adapter at lower temperatures degrades reasoning quality (see
scripts/run_easyread_v10_battery.py in the source repo for the calibration
study).
Architecture
- Type: LoRA (PEFT), rank 8, lora_alpha 160, dropout 0.0
- Target modules:
q_proj,o_proj,gate_proj,up_proj,down_proj,per_layer_input_gate,per_layer_projection - Layers targeted: top 16 (
layers_to_transform: [26..41]) - Base:
google/gemma-4-E4B-it - Storage: 26 MB (
adapter_model.safetensors)
Training
- Trainer:
mlx-lm(Apple Silicon LoRA fine-tune); converted to PEFT format for cross-platform inference. - Hardware: M-series Mac.
- Data: 622 mixed rows from
easy_read_v6— answer-only + channel-bound thinking variants. Self-bootstrapped from Gemma 4 31B as writer/critic/judge (no frontier-model dependence). - Recipe: lr 1e-5, scale 20, stop-weight 8, thought-loss 0.05.
Evaluation
Validated against a 20-question leakage-clean battery in
runs/2026-05-02_v10_bf16-mixed-report/REPORT.md of the source repo. Under
the recommended base_thought_lora_answer gate:
| Metric | vs. base | Direction |
|---|---|---|
| Length (chars) | −1,214 | ⬇ briefer |
| Length (tokens) | −265 | ⬇ briefer |
| Flesch-Kincaid grade | −3.56 | ⬇ easier |
| Dale-Chall difficulty | −2.26 | ⬇ easier |
| Strict reasoning correctness | 30/30 | ✓ preserved |
| Empty-answer rate | 0% | ✓ no regressions |
| ARC capability screen | no large delta | ✓ general capability intact |
Limitations
- English only. v9 mixed corpus is 100% English; multilingual coverage is reduced relative to the base model. A v9-multilingual corpus is queued.
- The trained system prompt matters. The LoRA was conditioned on the P_FULL system prompt verbatim. Using a different system prompt at inference time will weaken activation. Either keep P_FULL or use the neutral "You are a helpful assistant." (also seen during training).
- Calibrated for
temp=1.0. Low temperatures (≤0.3) collapse reasoning onto a "plan-only" trajectory and produce wrong answers on arithmetic questions. Stay at the Gemma 4 official spec. - Single-flight per process when using PEFT's adapter toggle. The enable/disable API mutates the model object globally — concurrent generation across threads requires a lock.
Intended use
- Plain-language rewrites of factual or instructional content for cognitively diverse audiences.
- Companion-style explanation of medical, legal, or technical material (paired with the gated-thinking strategy so the reasoning remains intact).
Out-of-scope use
- Safety-critical clinical decisions without human review.
- High-stakes legal or financial advice.
- Domains requiring formal register or technical precision in the output — the adapter compresses by design.
Related
- Demo Space: kameronk/clariso
- Base model: google/gemma-4-E4B-it
- E2B variant (smaller, on-device): coming soon to
kameronk/clariso-gemma4-e2b-v9-peft.
- Downloads last month
- 103