Text Generation
PEFT
Safetensors
English
gemma
gemma-4
lora
unsloth
litertlm
on-device
function-calling
tool-use
mobile
flutter
Instructions to use jtmuller/roadside-gemma-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use jtmuller/roadside-gemma-e2b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use jtmuller/roadside-gemma-e2b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="jtmuller/roadside-gemma-e2b", max_seq_length=2048, )
File size: 7,229 Bytes
40d9999 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | ---
license: gemma
base_model: unsloth/gemma-4-E2B-it
library_name: peft
pipeline_tag: text-generation
tags:
- gemma
- gemma-4
- lora
- unsloth
- litertlm
- on-device
- function-calling
- tool-use
- mobile
- flutter
language:
- en
---
# Roadside Gemma β E2B fine-tune for CDL pre-trip inspections
A LoRA fine-tune of **`unsloth/gemma-4-E2B-it`** that turns the base model into a
voice-driven copilot for **commercial-driver pre-trip vehicle inspections**.
The model runs **fully on-device** on a modern Android/iOS phone via
[`flutter_gemma`](https://pub.dev/packages/flutter_gemma) and the
[LiteRT](https://ai.google.dev/edge/litert) runtime β **no network required**,
which matters because most truck yards and pre-trip inspection sites are
cellular dead zones.
> Built for the **Gemma 4 Impact Challenge** (May 2026). Project repo:
> [github.com/jtmuller5/roadside-gemma](https://github.com/jtmuller5/roadside-gemma).
---
## What's in this repo
| Path | What it is | Size |
|------|------------|------|
| `lora-adapter/` | PEFT LoRA adapter (r=128, Ξ±=128, all attn + MLP projections). Merge against `unsloth/gemma-4-E2B-it`. | 948 MB |
| `litertlm/model.litertlm` | Deployment artifact for the LiteRT runtime. Quantized `dynamic_wi8_afp32`. Drop into `flutter_gemma` directly. | 4.8 GB |
The 9.6 GB merged BF16 is reproducible by merging the LoRA β omitted to keep
the repo lean.
---
## What the model actually does
The model is an **agent** with seven tools and a strict JSON tool-calling
contract. It guides the driver step-by-step through the 7-category /
54-item canonical pre-trip inspection (cab, engine, brakes, lights, tires,
trailer, coupling) and records OK / defect outcomes.
Tools surfaced to the model:
- `get_next_step()` β advance the inspection
- `query_inspection_item(step, item)` β return DOT inspection criteria
- `mark_item_ok(step, item)` β record a passing item
- `record_defect(step, item, severity, description)` β record a defect
- `complete_inspection()` β finalize and sign off
- (plus refusal / clarification turns with **no** tool call)
The training corpus enforces a canonical `(step, item)` keyset; the model is
trained to **refuse** off-topic asks and to **ask for clarification** rather
than hallucinate a tool call.
---
## Evaluation
30 hand-crafted prompts across 6 categories (5 each). Scored against
expected tool name + key args. "Hard fail" = wrong/no tool when one was
required. "Soft fail" = right tool, wrong arg (e.g. wrong side of vehicle).
| Category | v3 (no refusal data) | **v4 (this model)** |
|----------------|----------------------|---------------------|
| ambiguous | 0 / 5 (HF=5) | **5 / 5** β |
| off_topic | 1 / 5 (HF=4) | **5 / 5** β |
| multi_intent | 0 / 5 (HF=0) | 4 / 5 |
| mid_correction | 1 / 5 (HF=0) | 3 / 5 |
| happy_path | 2 / 5 (HF=0) | 2 / 5 (HF=1) |
| stt_noisy | 1 / 5 (HF=2) | 2 / 5 (HF=1) |
| **Total** | **5 / 30, HF=11** | **21 / 30, HF=2** |
With the production app-injected opener (`"Now checking <Item>. ..."`) in
context. No-context eval (worst case): 17 / 30, HF=4.
Remaining soft fails are mostly wrong-side args on dual-sided items
(`passenger_side` vs `driver_side`).
---
## The training journey (why two-factor matters)
v1 of this model **failed hard** (2/30 pass) and the debugging path is worth
documenting because two independent bugs combined to make it look like one:
1. **Loss-mask bug.** The initial training run computed loss over the full
sequence including the ~700-token system prompt. With 173 rows sharing
one prompt, the model "converged" by memorizing the prompt while never
fitting the assistant tool-call tokens. Fixed by switching to
`unsloth.chat_templates.train_on_responses_only`.
2. **Corpus pollution.** The 31B teacher model used to synthesize the corpus
hallucinated tool-call keys: 78 distinct `(step, item)` pairs in the data
vs. 54 canonical pairs. 46 / 173 rows (27%) were polluted. Fixed by
embedding the canonical catalog in the synthesis prompt and adding a
`validate_conversation()` step that drops any row referencing a
non-canonical pair.
3. **Missing refusal data.** Even the clean v3 corpus had zero examples of
"user asks something off-topic." The model called a tool every time
because it had never seen what *not* calling one looked like. Fixed by
adding **Cat 8** to the synthesis pipeline: 40 conversations across
ambiguity, off-topic, uncertainty, greetings, and acknowledgments β all
producing text responses with no tool call.
Each fix in isolation was insufficient. v4 = all three.
---
## Training recipe
- **Base:** `unsloth/gemma-4-E2B-it`
- **Framework:** Unsloth + TRL `SFTTrainer`
- **Adapter:** LoRA r=128, Ξ±=128, dropout=0
- **Target modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`,
`up_proj`, `down_proj`
- **Loss mask:** `train_on_responses_only` (assistant turns only)
- **Schedule:** 8 epochs, cosine LR 1e-4, batch_size=4 Γ grad_accum=2
(effective 8)
- **Corpus:** 380 synthetic conversations across 8 categories (340 task +
40 refusal), all teacher-generated against the canonical 54-item keyset
- **Hardware:** 1Γ RTX 5090 (32 GB VRAM)
- **Final train loss:** 0.155 mean (final batches ~0.01)
---
## Deployment
### Android / iOS via `flutter_gemma`
```dart
import 'package:flutter_gemma/flutter_gemma.dart';
final gemma = FlutterGemmaPlugin.instance;
await gemma.modelManager.setModelPath('<path>/model.litertlm');
final session = await gemma.createModel(/* ... */);
```
The `.litertlm` is quantized `dynamic_wi8_afp32` β the ship recipe per the
[`flutter_gemma` notes](https://pub.dev/packages/flutter_gemma). Recipes
that quantize the LoRA matrices (e.g. `wi4` at rank-128) erase the
fine-tune.
### PyTorch via PEFT
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-E2B-it")
tok = AutoTokenizer.from_pretrained("unsloth/gemma-4-E2B-it")
model = PeftModel.from_pretrained(base, "jtmuller/roadside-gemma-e2b",
subfolder="lora-adapter")
```
---
## Limitations & honest disclosure
- **Domain-narrow.** This is a pre-trip inspection agent, not a general
assistant. It will try to interpret most utterances as part of the
inspection flow.
- **English only.** Corpus is monolingual.
- **Dual-sided items are still soft.** Expect occasional wrong-side args
on tires, mirrors, lights.
- **Synthetic corpus.** All training data is teacher-generated, not
real driver transcripts. The Cat 5 (STT-noisy) category models speech
recognition artifacts but isn't a substitute for real STT data.
- **Safety scope.** This model assists with the inspection workflow.
It does **not** replace a qualified driver's judgment about whether a
vehicle is safe to operate.
---
## License
- LoRA adapter and `.litertlm`: released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
- Synthesis prompts and code in the project repo: MIT.
|