Text Generation
PEFT
Safetensors
English
gemma
gemma-4
lora
unsloth
litertlm
on-device
function-calling
tool-use
mobile
flutter
Instructions to use jtmuller/roadside-gemma-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use jtmuller/roadside-gemma-e2b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use jtmuller/roadside-gemma-e2b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="jtmuller/roadside-gemma-e2b", max_seq_length=2048, )
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gemma
|
| 3 |
+
base_model: unsloth/gemma-4-E2B-it
|
| 4 |
+
library_name: peft
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
tags:
|
| 7 |
+
- gemma
|
| 8 |
+
- gemma-4
|
| 9 |
+
- lora
|
| 10 |
+
- unsloth
|
| 11 |
+
- litertlm
|
| 12 |
+
- on-device
|
| 13 |
+
- function-calling
|
| 14 |
+
- tool-use
|
| 15 |
+
- mobile
|
| 16 |
+
- flutter
|
| 17 |
+
language:
|
| 18 |
+
- en
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
# Roadside Gemma β E2B fine-tune for CDL pre-trip inspections
|
| 22 |
+
|
| 23 |
+
A LoRA fine-tune of **`unsloth/gemma-4-E2B-it`** that turns the base model into a
|
| 24 |
+
voice-driven copilot for **commercial-driver pre-trip vehicle inspections**.
|
| 25 |
+
|
| 26 |
+
The model runs **fully on-device** on a modern Android/iOS phone via
|
| 27 |
+
[`flutter_gemma`](https://pub.dev/packages/flutter_gemma) and the
|
| 28 |
+
[LiteRT](https://ai.google.dev/edge/litert) runtime β **no network required**,
|
| 29 |
+
which matters because most truck yards and pre-trip inspection sites are
|
| 30 |
+
cellular dead zones.
|
| 31 |
+
|
| 32 |
+
> Built for the **Gemma 4 Impact Challenge** (May 2026). Project repo:
|
| 33 |
+
> [github.com/jtmuller5/roadside-gemma](https://github.com/jtmuller5/roadside-gemma).
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## What's in this repo
|
| 38 |
+
|
| 39 |
+
| Path | What it is | Size |
|
| 40 |
+
|------|------------|------|
|
| 41 |
+
| `lora-adapter/` | PEFT LoRA adapter (r=128, Ξ±=128, all attn + MLP projections). Merge against `unsloth/gemma-4-E2B-it`. | 948 MB |
|
| 42 |
+
| `litertlm/model.litertlm` | Deployment artifact for the LiteRT runtime. Quantized `dynamic_wi8_afp32`. Drop into `flutter_gemma` directly. | 4.8 GB |
|
| 43 |
+
|
| 44 |
+
The 9.6 GB merged BF16 is reproducible by merging the LoRA β omitted to keep
|
| 45 |
+
the repo lean.
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## What the model actually does
|
| 50 |
+
|
| 51 |
+
The model is an **agent** with seven tools and a strict JSON tool-calling
|
| 52 |
+
contract. It guides the driver step-by-step through the 7-category /
|
| 53 |
+
54-item canonical pre-trip inspection (cab, engine, brakes, lights, tires,
|
| 54 |
+
trailer, coupling) and records OK / defect outcomes.
|
| 55 |
+
|
| 56 |
+
Tools surfaced to the model:
|
| 57 |
+
|
| 58 |
+
- `get_next_step()` β advance the inspection
|
| 59 |
+
- `query_inspection_item(step, item)` β return DOT inspection criteria
|
| 60 |
+
- `mark_item_ok(step, item)` β record a passing item
|
| 61 |
+
- `record_defect(step, item, severity, description)` β record a defect
|
| 62 |
+
- `complete_inspection()` β finalize and sign off
|
| 63 |
+
- (plus refusal / clarification turns with **no** tool call)
|
| 64 |
+
|
| 65 |
+
The training corpus enforces a canonical `(step, item)` keyset; the model is
|
| 66 |
+
trained to **refuse** off-topic asks and to **ask for clarification** rather
|
| 67 |
+
than hallucinate a tool call.
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
## Evaluation
|
| 72 |
+
|
| 73 |
+
30 hand-crafted prompts across 6 categories (5 each). Scored against
|
| 74 |
+
expected tool name + key args. "Hard fail" = wrong/no tool when one was
|
| 75 |
+
required. "Soft fail" = right tool, wrong arg (e.g. wrong side of vehicle).
|
| 76 |
+
|
| 77 |
+
| Category | v3 (no refusal data) | **v4 (this model)** |
|
| 78 |
+
|----------------|----------------------|---------------------|
|
| 79 |
+
| ambiguous | 0 / 5 (HF=5) | **5 / 5** β |
|
| 80 |
+
| off_topic | 1 / 5 (HF=4) | **5 / 5** β |
|
| 81 |
+
| multi_intent | 0 / 5 (HF=0) | 4 / 5 |
|
| 82 |
+
| mid_correction | 1 / 5 (HF=0) | 3 / 5 |
|
| 83 |
+
| happy_path | 2 / 5 (HF=0) | 2 / 5 (HF=1) |
|
| 84 |
+
| stt_noisy | 1 / 5 (HF=2) | 2 / 5 (HF=1) |
|
| 85 |
+
| **Total** | **5 / 30, HF=11** | **21 / 30, HF=2** |
|
| 86 |
+
|
| 87 |
+
With the production app-injected opener (`"Now checking <Item>. ..."`) in
|
| 88 |
+
context. No-context eval (worst case): 17 / 30, HF=4.
|
| 89 |
+
|
| 90 |
+
Remaining soft fails are mostly wrong-side args on dual-sided items
|
| 91 |
+
(`passenger_side` vs `driver_side`).
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## The training journey (why two-factor matters)
|
| 96 |
+
|
| 97 |
+
v1 of this model **failed hard** (2/30 pass) and the debugging path is worth
|
| 98 |
+
documenting because two independent bugs combined to make it look like one:
|
| 99 |
+
|
| 100 |
+
1. **Loss-mask bug.** The initial training run computed loss over the full
|
| 101 |
+
sequence including the ~700-token system prompt. With 173 rows sharing
|
| 102 |
+
one prompt, the model "converged" by memorizing the prompt while never
|
| 103 |
+
fitting the assistant tool-call tokens. Fixed by switching to
|
| 104 |
+
`unsloth.chat_templates.train_on_responses_only`.
|
| 105 |
+
2. **Corpus pollution.** The 31B teacher model used to synthesize the corpus
|
| 106 |
+
hallucinated tool-call keys: 78 distinct `(step, item)` pairs in the data
|
| 107 |
+
vs. 54 canonical pairs. 46 / 173 rows (27%) were polluted. Fixed by
|
| 108 |
+
embedding the canonical catalog in the synthesis prompt and adding a
|
| 109 |
+
`validate_conversation()` step that drops any row referencing a
|
| 110 |
+
non-canonical pair.
|
| 111 |
+
3. **Missing refusal data.** Even the clean v3 corpus had zero examples of
|
| 112 |
+
"user asks something off-topic." The model called a tool every time
|
| 113 |
+
because it had never seen what *not* calling one looked like. Fixed by
|
| 114 |
+
adding **Cat 8** to the synthesis pipeline: 40 conversations across
|
| 115 |
+
ambiguity, off-topic, uncertainty, greetings, and acknowledgments β all
|
| 116 |
+
producing text responses with no tool call.
|
| 117 |
+
|
| 118 |
+
Each fix in isolation was insufficient. v4 = all three.
|
| 119 |
+
|
| 120 |
+
---
|
| 121 |
+
|
| 122 |
+
## Training recipe
|
| 123 |
+
|
| 124 |
+
- **Base:** `unsloth/gemma-4-E2B-it`
|
| 125 |
+
- **Framework:** Unsloth + TRL `SFTTrainer`
|
| 126 |
+
- **Adapter:** LoRA r=128, Ξ±=128, dropout=0
|
| 127 |
+
- **Target modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`,
|
| 128 |
+
`up_proj`, `down_proj`
|
| 129 |
+
- **Loss mask:** `train_on_responses_only` (assistant turns only)
|
| 130 |
+
- **Schedule:** 8 epochs, cosine LR 1e-4, batch_size=4 Γ grad_accum=2
|
| 131 |
+
(effective 8)
|
| 132 |
+
- **Corpus:** 380 synthetic conversations across 8 categories (340 task +
|
| 133 |
+
40 refusal), all teacher-generated against the canonical 54-item keyset
|
| 134 |
+
- **Hardware:** 1Γ RTX 5090 (32 GB VRAM)
|
| 135 |
+
- **Final train loss:** 0.155 mean (final batches ~0.01)
|
| 136 |
+
|
| 137 |
+
---
|
| 138 |
+
|
| 139 |
+
## Deployment
|
| 140 |
+
|
| 141 |
+
### Android / iOS via `flutter_gemma`
|
| 142 |
+
|
| 143 |
+
```dart
|
| 144 |
+
import 'package:flutter_gemma/flutter_gemma.dart';
|
| 145 |
+
|
| 146 |
+
final gemma = FlutterGemmaPlugin.instance;
|
| 147 |
+
await gemma.modelManager.setModelPath('<path>/model.litertlm');
|
| 148 |
+
final session = await gemma.createModel(/* ... */);
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
The `.litertlm` is quantized `dynamic_wi8_afp32` β the ship recipe per the
|
| 152 |
+
[`flutter_gemma` notes](https://pub.dev/packages/flutter_gemma). Recipes
|
| 153 |
+
that quantize the LoRA matrices (e.g. `wi4` at rank-128) erase the
|
| 154 |
+
fine-tune.
|
| 155 |
+
|
| 156 |
+
### PyTorch via PEFT
|
| 157 |
+
|
| 158 |
+
```python
|
| 159 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 160 |
+
from peft import PeftModel
|
| 161 |
+
|
| 162 |
+
base = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-E2B-it")
|
| 163 |
+
tok = AutoTokenizer.from_pretrained("unsloth/gemma-4-E2B-it")
|
| 164 |
+
model = PeftModel.from_pretrained(base, "jtmuller/roadside-gemma-e2b",
|
| 165 |
+
subfolder="lora-adapter")
|
| 166 |
+
```
|
| 167 |
+
|
| 168 |
+
---
|
| 169 |
+
|
| 170 |
+
## Limitations & honest disclosure
|
| 171 |
+
|
| 172 |
+
- **Domain-narrow.** This is a pre-trip inspection agent, not a general
|
| 173 |
+
assistant. It will try to interpret most utterances as part of the
|
| 174 |
+
inspection flow.
|
| 175 |
+
- **English only.** Corpus is monolingual.
|
| 176 |
+
- **Dual-sided items are still soft.** Expect occasional wrong-side args
|
| 177 |
+
on tires, mirrors, lights.
|
| 178 |
+
- **Synthetic corpus.** All training data is teacher-generated, not
|
| 179 |
+
real driver transcripts. The Cat 5 (STT-noisy) category models speech
|
| 180 |
+
recognition artifacts but isn't a substitute for real STT data.
|
| 181 |
+
- **Safety scope.** This model assists with the inspection workflow.
|
| 182 |
+
It does **not** replace a qualified driver's judgment about whether a
|
| 183 |
+
vehicle is safe to operate.
|
| 184 |
+
|
| 185 |
+
---
|
| 186 |
+
|
| 187 |
+
## License
|
| 188 |
+
|
| 189 |
+
- LoRA adapter and `.litertlm`: released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
|
| 190 |
+
- Synthesis prompts and code in the project repo: MIT.
|