Text Generation
PEFT
Safetensors
English
gemma
gemma-4
lora
unsloth
litertlm
on-device
function-calling
tool-use
mobile
flutter
Instructions to use jtmuller/roadside-gemma-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use jtmuller/roadside-gemma-e2b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use jtmuller/roadside-gemma-e2b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jtmuller/roadside-gemma-e2b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="jtmuller/roadside-gemma-e2b", max_seq_length=2048, )
| license: gemma | |
| base_model: unsloth/gemma-4-E2B-it | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| tags: | |
| - gemma | |
| - gemma-4 | |
| - lora | |
| - unsloth | |
| - litertlm | |
| - on-device | |
| - function-calling | |
| - tool-use | |
| - mobile | |
| - flutter | |
| language: | |
| - en | |
| # Roadside Gemma — E2B fine-tune for CDL pre-trip inspections | |
| A LoRA fine-tune of **`unsloth/gemma-4-E2B-it`** that turns the base model into a | |
| voice-driven copilot for **commercial-driver pre-trip vehicle inspections**. | |
| The model runs **fully on-device** on a modern Android/iOS phone via | |
| [`flutter_gemma`](https://pub.dev/packages/flutter_gemma) and the | |
| [LiteRT](https://ai.google.dev/edge/litert) runtime — **no network required**, | |
| which matters because most truck yards and pre-trip inspection sites are | |
| cellular dead zones. | |
| > Built for the **Gemma 4 Impact Challenge** (May 2026). Project repo: | |
| > [github.com/jtmuller5/roadside-gemma](https://github.com/jtmuller5/roadside-gemma). | |
| --- | |
| ## What's in this repo | |
| | Path | What it is | Size | | |
| |------|------------|------| | |
| | `lora-adapter/` | PEFT LoRA adapter (r=128, α=128, all attn + MLP projections). Merge against `unsloth/gemma-4-E2B-it`. | 948 MB | | |
| | `litertlm/model.litertlm` | Deployment artifact for the LiteRT runtime. Quantized `dynamic_wi8_afp32`. Drop into `flutter_gemma` directly. | 4.8 GB | | |
| The 9.6 GB merged BF16 is reproducible by merging the LoRA — omitted to keep | |
| the repo lean. | |
| --- | |
| ## What the model actually does | |
| The model is an **agent** with seven tools and a strict JSON tool-calling | |
| contract. It guides the driver step-by-step through the 7-category / | |
| 54-item canonical pre-trip inspection (cab, engine, brakes, lights, tires, | |
| trailer, coupling) and records OK / defect outcomes. | |
| Tools surfaced to the model: | |
| - `get_next_step()` — advance the inspection | |
| - `query_inspection_item(step, item)` — return DOT inspection criteria | |
| - `mark_item_ok(step, item)` — record a passing item | |
| - `record_defect(step, item, severity, description)` — record a defect | |
| - `complete_inspection()` — finalize and sign off | |
| - (plus refusal / clarification turns with **no** tool call) | |
| The training corpus enforces a canonical `(step, item)` keyset; the model is | |
| trained to **refuse** off-topic asks and to **ask for clarification** rather | |
| than hallucinate a tool call. | |
| --- | |
| ## Evaluation | |
| 30 hand-crafted prompts across 6 categories (5 each). Scored against | |
| expected tool name + key args. "Hard fail" = wrong/no tool when one was | |
| required. "Soft fail" = right tool, wrong arg (e.g. wrong side of vehicle). | |
| | Category | v3 (no refusal data) | **v4 (this model)** | | |
| |----------------|----------------------|---------------------| | |
| | ambiguous | 0 / 5 (HF=5) | **5 / 5** ✓ | | |
| | off_topic | 1 / 5 (HF=4) | **5 / 5** ✓ | | |
| | multi_intent | 0 / 5 (HF=0) | 4 / 5 | | |
| | mid_correction | 1 / 5 (HF=0) | 3 / 5 | | |
| | happy_path | 2 / 5 (HF=0) | 2 / 5 (HF=1) | | |
| | stt_noisy | 1 / 5 (HF=2) | 2 / 5 (HF=1) | | |
| | **Total** | **5 / 30, HF=11** | **21 / 30, HF=2** | | |
| With the production app-injected opener (`"Now checking <Item>. ..."`) in | |
| context. No-context eval (worst case): 17 / 30, HF=4. | |
| Remaining soft fails are mostly wrong-side args on dual-sided items | |
| (`passenger_side` vs `driver_side`). | |
| --- | |
| ## The training journey (why two-factor matters) | |
| v1 of this model **failed hard** (2/30 pass) and the debugging path is worth | |
| documenting because two independent bugs combined to make it look like one: | |
| 1. **Loss-mask bug.** The initial training run computed loss over the full | |
| sequence including the ~700-token system prompt. With 173 rows sharing | |
| one prompt, the model "converged" by memorizing the prompt while never | |
| fitting the assistant tool-call tokens. Fixed by switching to | |
| `unsloth.chat_templates.train_on_responses_only`. | |
| 2. **Corpus pollution.** The 31B teacher model used to synthesize the corpus | |
| hallucinated tool-call keys: 78 distinct `(step, item)` pairs in the data | |
| vs. 54 canonical pairs. 46 / 173 rows (27%) were polluted. Fixed by | |
| embedding the canonical catalog in the synthesis prompt and adding a | |
| `validate_conversation()` step that drops any row referencing a | |
| non-canonical pair. | |
| 3. **Missing refusal data.** Even the clean v3 corpus had zero examples of | |
| "user asks something off-topic." The model called a tool every time | |
| because it had never seen what *not* calling one looked like. Fixed by | |
| adding **Cat 8** to the synthesis pipeline: 40 conversations across | |
| ambiguity, off-topic, uncertainty, greetings, and acknowledgments — all | |
| producing text responses with no tool call. | |
| Each fix in isolation was insufficient. v4 = all three. | |
| --- | |
| ## Training recipe | |
| - **Base:** `unsloth/gemma-4-E2B-it` | |
| - **Framework:** Unsloth + TRL `SFTTrainer` | |
| - **Adapter:** LoRA r=128, α=128, dropout=0 | |
| - **Target modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, | |
| `up_proj`, `down_proj` | |
| - **Loss mask:** `train_on_responses_only` (assistant turns only) | |
| - **Schedule:** 8 epochs, cosine LR 1e-4, batch_size=4 × grad_accum=2 | |
| (effective 8) | |
| - **Corpus:** 380 synthetic conversations across 8 categories (340 task + | |
| 40 refusal), all teacher-generated against the canonical 54-item keyset | |
| - **Hardware:** 1× RTX 5090 (32 GB VRAM) | |
| - **Final train loss:** 0.155 mean (final batches ~0.01) | |
| --- | |
| ## Deployment | |
| ### Android / iOS via `flutter_gemma` | |
| ```dart | |
| import 'package:flutter_gemma/flutter_gemma.dart'; | |
| final gemma = FlutterGemmaPlugin.instance; | |
| await gemma.modelManager.setModelPath('<path>/model.litertlm'); | |
| final session = await gemma.createModel(/* ... */); | |
| ``` | |
| The `.litertlm` is quantized `dynamic_wi8_afp32` — the ship recipe per the | |
| [`flutter_gemma` notes](https://pub.dev/packages/flutter_gemma). Recipes | |
| that quantize the LoRA matrices (e.g. `wi4` at rank-128) erase the | |
| fine-tune. | |
| ### PyTorch via PEFT | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from peft import PeftModel | |
| base = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-E2B-it") | |
| tok = AutoTokenizer.from_pretrained("unsloth/gemma-4-E2B-it") | |
| model = PeftModel.from_pretrained(base, "jtmuller/roadside-gemma-e2b", | |
| subfolder="lora-adapter") | |
| ``` | |
| --- | |
| ## Limitations & honest disclosure | |
| - **Domain-narrow.** This is a pre-trip inspection agent, not a general | |
| assistant. It will try to interpret most utterances as part of the | |
| inspection flow. | |
| - **English only.** Corpus is monolingual. | |
| - **Dual-sided items are still soft.** Expect occasional wrong-side args | |
| on tires, mirrors, lights. | |
| - **Synthetic corpus.** All training data is teacher-generated, not | |
| real driver transcripts. The Cat 5 (STT-noisy) category models speech | |
| recognition artifacts but isn't a substitute for real STT data. | |
| - **Safety scope.** This model assists with the inspection workflow. | |
| It does **not** replace a qualified driver's judgment about whether a | |
| vehicle is safe to operate. | |
| --- | |
| ## License | |
| - LoRA adapter and `.litertlm`: released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). | |
| - Synthesis prompts and code in the project repo: MIT. | |