Joby IT Service Desk — Gemma 4 31B (Fine-Tuned, Tool-Aware)

A LoRA fine-tune of Google's Gemma 4 31B-it (dense, multimodal) specialized for Joby Aviation's internal IT service desk workflows, with native tool-calling preserved through a mixed Joby + general-purpose function-calling training corpus.

Private, internal model. Weights contain Joby-specific references (Jira project keys, Confluence KB content, internal hostnames, infrastructure conventions). Distributed only to Joby engineering staff via the PRE installer. Not for public release.

  • Repo (this card): sunkencity/joby-it-servicedesk (GGUF, private)
  • Adapter repo: sunkencity/joby-it-servicedesk-lora (private)
  • Data pipeline: https://github.com/sunkencity999/joby-datasets (private)
  • Released: 2026-05-30 (v0.4)
  • Maintainer: Christopher Bradford, Systems Administration / AI Engineering, Joby Aviation

v0.4 — tool-protocol fix

v0.3 fabricated tool responses inline: it would emit a valid tool call and then immediately roleplay the tool's reply and a final answer in one generation (response:NAME{value:<|"|>{...}<|"|>}<tool_response|>The status is...). Root cause was a training-format defect — Gemma 4's stock chat_template.jinja packs the tool call, tool response, and assistant follow-up into a single <|turn>model block, and the LoRA learned to imitate that one-shot shape.

v0.4 corrects it at the source:

  • New renderer (joby_chat_template.py) puts the tool call, tool response, and assistant follow-up in separate <|turn> blocks (<|tool_call>...<tool_call|><turn|><|turn>tool ... <turn|> → fresh <|turn>model). Same Gemma 4 native special tokens, corrected boundaries.
  • Pre-rendered training data (build_mixed_dataset.py emits {"text": ...} records) bypasses MLX-LM's default chat-template application.
  • Wider LoRA: attention + FFN rank-32 (q/v/gate/up/down) instead of attention-only rank-16. The "preserve FFN tool-call pathway" hypothesis was wrong — that pathway was already broken.
  • Best-val checkpoint shipped: iter 6,750 of the retrain (val 0.638).

Inference: Ollama's stock gemma4 renderer doesn't reproduce the corrected turn structure, so PRETeams routes this model through /api/generate raw mode with a JS port of joby_chat_template (preteams:src/joby-template.js). Base PRE keeps using /api/chat for other models.


1. At a glance

Base google/gemma-4-31B-it (dense, multimodal, 31B params) via mlx-community/gemma-4-31b-it-bf16
Method Parameter-efficient LoRA — attention + FFN, top-16 layers
Adapter size 211 MB (~53.0M trainable params, ~0.173% of base)
Training data 11,062 examples pre-rendered to text — 72% Joby IT / 25% Glaive function-calling / 3% PRE tool-use traces
Tool-call coverage 28.1% of examples carry structured tool_calls (rendered into the trained text via the new template)
Hardware M4 Max, 128 GB unified memory (Apple Silicon, MLX-LM)
Training time ~30 min initial run (2,250 iters before OOM) + ~5.5 h resume with grad_checkpoint=true (10,000 more iters)
Best checkpoint iter 6,750 of the resume (val loss 0.638, used for the fused model)
Final-iter val 0.843 — not released; train loss had drifted to ~0.57, mild memorization
Tool-call probe Turn 1 emits clean structured calls and stops; turn 2 grounds on real tool results (e.g. "Waiting for Customer" → correct status + assignee)
Released quants f16 (57 GB), q8_0 (30 GB), q4_K_M (~17 GB)
License Gemma Terms of Use (base) + internal-only on derivative weights

2. What it's good at

  • Native tool-calling preserved. Smoke-tests at 3 / 3 on PRE-style probes (date, bash, memory_search) — same activation rate as the unmodified base model. Designed to operate as a tool-augmented agent, not a closed-book oracle.
  • Joby IT vocabulary and patterns. Internal terminology, ticket structure, and resolution conventions for license provisioning, network ergonomics, hardware lifecycle, account ops, and the day-to-day Joby IT helpdesk shape. Strong familiarity with the systems landscape: Jira (IT / DHLP / JSD projects), Confluence (ITKB space), D365, Smartsheet, Active Directory / Entra ID, Intune / Jamf, VPN / SSO / SAML, M365.
  • Structured technical responses. Clean step-by-step procedures, properly fenced shell snippets, headers when length warrants. Inherits Gemma 4's strong instruction-following on top of the Joby-specific stylistic prior learned from agent replies.
  • Multi-turn agent loops. Mixed in 241 real PRE tool-use traces during training, so the model has seen the full ChatML shape (system → user → assistant{tool_calls} → tool{result} → assistant) and handles multi-step plans without falling out of tool format.
  • Long-context aware. Inherits Gemma 4's 256 K (262,144) token position embeddings. In practice PRE deploys at 8K–128K depending on RAM headroom (see PRE's context sizing table); the model itself is not the bottleneck.

3. What it isn't

  • Not a substitute for live tool calls. Joby-specific facts — current ticket IDs, URLs, account numbers, AWS account IDs, network ranges, on-call rotations — must come from confluence.search, jira.get_issue, smartsheet.*, etc., at inference time, not from the model's baked-in knowledge. The mixed-data recipe intentionally weakened memorization in favor of tool-use behavior. Expect hallucinated URLs, ticket IDs, and account references if the model is run without tools.
  • Not a general-purpose chatbot. Capability outside Joby's operational footprint is no better than base Gemma 4 31B-it, and may be slightly worse in stylistic register (responses skew toward ticket-resolution prose).
  • Not for incident response without human review. A senior IT staff member must validate any output that triggers operational change (account provisioning, group-membership changes, MFA resets, MDM commands).
  • Not multimodal at inference. Although the base model is multimodal (image + audio tokens in its vocabulary), the fine-tune is text-only. Images submitted at inference will be tokenized but the model has no learned grounding for them in the Joby domain.
  • Not bilingual. Training data is English-only. Spanish/Portuguese fallback is base-Gemma-quality at best.

4. Architecture (base model)

Model type gemma4 (dense, multimodal)
Parameters 31 B
Hidden size 5,376
Layers 60 transformer blocks
Attention heads 32 query / 16 KV (GQA, 2:1)
Head dim 256
FFN intermediate 21,504
Sliding window 1,024 (interleaved with global attention)
Vocab size 262,144 (text + image + audio + control tokens)
Max position 262,144 (256 K)
Tie embeddings Yes
dtype (train) bfloat16

Only the language-model trunk is touched by the LoRA. Vision and audio towers are frozen and effectively unused in this deployment.


5. Training data

A deliberately mixed corpus — 10,990 ChatML conversations. The mix is the central design choice of v0.3 and is what made tool-calling survive (see §9 Version History).

Source Count Share Role
Joby IT tickets + KB (Jira + Confluence, LLM-synthesized) 7,949 72.3% Domain knowledge + Joby register
Glaive function-calling v2 (glaiveai/glaive-function-calling-v2) 2,800 25.5% Tool-call shape & format
PRE session tool-use traces (~/.pre/sessions/) 241 2.2% Realistic multi-turn agent loops
Total 10,990 100% 90/10 train/val split

27.6% of examples contain structured tool_calls fields. 72.4% are plain assistant-text completions.

5.1 Joby tickets + KB (7,949)

Built by the joby-datasets pipeline (~/joby-datasets/). Each row is one Jira ticket or one Confluence section, rewritten by a local LLM (pre-gemma4 via Ollama) into a clean instruction → agent-style answer pair:

Extract  →  Transform (synthesized)  →  Filter  →  Format (ChatML)
 Jira/Confluence    LLM-rewrite             dedup/trim     embed system prompt
  • Jira JQL. project = IT AND statusCategory = Done AND resolved >= "-730d" — last two years of resolved IT tickets, paginated 100 at a time with retry on 429/5xx. Fields requested: summary, description, status, resolution, priority, issuetype, labels, components, created/updated/resolved dates, reporter, assignee, full comment thread.
  • Confluence space. ITKB (IT Knowledge Base), 69 pages, recursive descent. Min page length 200 chars.
  • Transform (synthesized mode). Each Jira ticket's full comment thread is rewritten by pre-gemma4 into a clean agent reply. Confluence pages are chunked at H1/H2 boundaries; each chunk is paired with LLM-generated questions to form instruction/answer pairs.
  • Filter. Length bounds, one-liner rejection (fixed, duplicate, see IT-123), email-signature stripping, attachment-marker removal, fuzzy dedup with rapidfuzz, mojibake/non-ASCII bloat removal.
  • System prompt baked in (ChatML): identifies the model as a Joby IT service-desk assistant.

5.2 Glaive function-calling v2 (2,800)

glaiveai/glaive-function-calling-v2 — the open-source canonical function-call training set. Streamed via ijson from the local HF cache; only conversations containing at least one <functioncall>{...}</functioncall> block are kept (non-tool turns are discarded). Glaive's idiosyncratic markup (USER: / A: / ASSISTANT: / FUNCTION RESPONSE: role tags, single-quote-inside-double-quote argument strings) is parsed by build_mixed_dataset.py::parse_glaive_chat into proper ChatML messages with structured tool_calls fields. 2,800 records were sampled with seed=42.

5.3 PRE session tool-use traces (241)

Real conversation histories from ~/.pre/sessions/ exported via web/src/training.js's exportTrainingData({format:'chatml', minToolCalls:1}). PRE emits tool calls as inline <tool_call>{...}</tool_call> markup in assistant content; the build script (_convert_pre_record) promotes those to structured tool_calls fields so Gemma 4's chat template renders them with native function-call tokens.

These are the most "in-domain" tool-use examples in the mix — they contain real PRE tool names (bash, memory_search, confluence_search, jira_get_issue, calendar_list_events, apple_mail_compose, rag_search, …) and the real ChatML shape the deployed model will see.

5.4 Quality controls and what was not done

  • No PII redaction. Joby IT tickets are confirmed PII-free at design time. Internal account IDs, hostnames, and infrastructure conventions are present in the weights — treat outputs as internal-classification material.
  • No license/copyright scrubbing of Confluence content beyond the standard filter pipeline.
  • No safety RLHF. This is a supervised LoRA only. The base model's RLHF/safety tuning is the only behavioral guardrail.
  • No multimodal data. Text-only fine-tune; vision/audio pathways untouched.

6. Training recipe

Full reproducible config in lora_config.yaml. Mixing script in build_mixed_dataset.py.

Knob Value Why
Base mlx-community/gemma-4-31b-it-bf16 Apple-native bf16, fast on MLX
Method LoRA (PEFT) 5.84M trainable / 31B base ≈ 0.019% — cheap, reversible
Target modules self_attn.q_proj, self_attn.v_proj Attention-only — leaves FFN untouched. The FFN is where Gemma 4's tool-call routing primarily lives; disturbing it caused the v0.2 regression to 0/3 tool calls.
LoRA rank r 16 Sufficient for ~10K examples; rank ≥ 32 begins to memorize ticket IDs
LoRA alpha 32 2:1 alpha:rank; effective LR multiplier = 2.0
LoRA dropout 0.05 Mild regularization on noisy ticket data
Layers top 16 of 60 (last quarter) Instruction-following signal concentrates near the head
Sequence length 4,096 tokens Covers >99% of training rows after filter
Batch size 1 Dense 31B in bf16 — single batch peaks at 102.96 GB of unified memory
Gradient accumulation 1 (none) M4 Max headroom permits
Optimizer AdamW (MLX-LM default) β1=0.9, β2=0.999, ε=1e-8
Learning rate 1.0 × 10⁻⁵ Conservative for instruction tuning; no warmup, constant schedule
Iterations 10,000 ≈ 1 epoch over 9,891-example training split
Seed 42 Deterministic split + sampling
Gradient checkpointing off Not memory-bound at batch=1 on 128 GB
Eval cadence every 250 iters, 50 val batches ≈ 63% of held-out val per eval
Checkpoint cadence every 250 iters 40 intermediate snapshots saved

6.1 Why attention-only LoRA

The v0.2 fine-tune (Joby-only, full LoRA targeting q/k/v/o + gate/up/down) collapsed tool-calling to 0 / 3. The Joby corpus has zero tool-call traces, and full LoRA strongly overfit the FFN toward plain-text replies, overriding the base model's learned tool-call routing.

v0.3 fixes this two ways: (a) mix in 27.6% structured tool-call examples so the model actually sees tool calls during training, and (b) restrict LoRA to attention q_proj / v_proj only, leaving the FFN — where tool routing lives — completely untouched. The two changes are complementary; ablating either reproduces the regression.

6.2 Hardware & throughput

  • M4 Max, 128 GB unified memory, macOS, MLX-LM trainer (Apple Silicon native, Metal-backed).
  • Sustained throughput: ~120 tokens/sec, ~0.4 iter/sec (batch 1, seq 4096).
  • Peak memory: 102.96 GB unified (≈80% of 128 GB; macOS swap headroom kept the system responsive).
  • Wall time: ~5 hours for 10,000 iters, ~80 sec/eval for 50 val batches.

6.3 Training dynamics

Validation loss every 250 iters. Best checkpoint is iter 9,250 (val 0.678). The final iter's spike to 1.077 is consistent with the noisy single-batch SGD on a mixed corpus and is not what we released — the published GGUF uses iter 9,250.

Iter Val Iter Val Iter Val
1 6.075 3,500 1.096 7,000 0.762
250 1.082 3,750 1.148 7,250 0.867
500 0.955 4,000 0.900 7,500 0.912
750 0.840 4,250 0.971 7,750 0.897
1,000 1.246 4,500 1.067 8,000 1.024
1,250 1.020 4,750 0.979 8,250 0.841
1,500 0.851 5,000 0.954 8,500 0.740
1,750 0.964 5,250 0.972 8,750 1.022
2,000 1.010 5,500 0.958 9,000 1.045
2,250 0.892 5,750 0.944 9,250 0.678
2,500 1.169 6,000 1.107 9,500 0.810
2,750 0.838 6,250 0.903 9,750 0.964
3,000 0.808 6,500 0.784 10,000 1.077
3,250 0.995 6,750 0.956

Loss is noisy because batch size = 1 is the dominant variance source. The trend is clearly downward through ~iter 9,250 with the floor stepping down through 0.840 → 0.808 → 0.762 → 0.740 → 0.678.


7. Evaluation

Run with evaluate.py after ollama create joby-it-servicedesk -f Modelfile.joby:

python evaluate.py --model joby-it-servicedesk:q8_0 --base pre-gemma4

7.1 Tool-calling smoke test

Three prompts that have no overlap with the training corpus, designed to probe whether native tool activation survived the LoRA. Each prompt is paired with the minimum tool schema (bash, date, memory_search) the model needs to respond correctly.

Probe Expected tool v0.3 adapted base pre-gemma4
"What time is it right now?" date
"List the files in my home directory." bash
"What do you remember about my role at Joby?" memory_search
Activation rate 3 / 3 3 / 3

The fine-tune matches the base model's tool-activation rate. This was the binary success criterion that v0.1 and v0.2 failed.

7.2 Held-out validation loss

Checkpoint Val loss Notes
Iter 9,250 (released) 0.678 Best of 40 checkpoints
Iter 8,500 0.740 Second-best
Iter 7,000 0.762 First sub-0.80 sustained
Iter 10,000 1.077 Final-iter — not released

7.3 Domain knowledge probes

Free-form, no-tool generation. Used as a qualitative sanity check (correctness must still be verified against live Confluence/Jira):

  • How do I request a Fusion 360 license at Joby?
  • Where is the IT Knowledge Base in Confluence?
  • What's the Jira project key for the IT service desk?
  • How do I connect to the Joby VPN?

These produce on-format, Joby-styled answers. They are not authoritative — the model may invent URLs, ticket numbers, or KB titles. Always couple with a real confluence.search / jira.get_issue tool call before acting.

7.4 What is not evaluated here

  • Exact-match accuracy on closed-book Joby facts. Deliberately not measured, because we want the model to defer to tools.
  • Toxicity / safety. Inherits Gemma 4's RLHF; no separate red-teaming for the adapter.
  • Long-context comprehension >32K. Inherits Gemma 4's 256K positional embeddings but was trained at max_seq_length=4096. Behavior at 32K–128K context is governed by the base; expect graceful degradation, not measured here.

8. Files in this repo

File Size Purpose
joby-it-servicedesk.q8_0.gguf ~30 GB Primary artifact. Matches PRE's default quantization for Apple Silicon with ≥28 GB VRAM/unified-memory headroom.
joby-it-servicedesk.q4_K_M.gguf ~17 GB Lower-VRAM variant — for Intel Macs with 16 GB eGPU, Windows boxes with smaller GPUs, or any setup where q8 won't fit fully on-device.
joby-it-servicedesk.f16.gguf ~57 GB Full-precision reference. Use as the source for custom requants.
Modelfile.joby Ollama Modelfile (sampling defaults match PRE's engine/Modelfile)

Companion adapter repo (sunkencity/joby-it-servicedesk-lora) ships:

  • adapter_model.safetensors — 22.27 MB LoRA weights (rank-16, attention-only).
  • lora_config.yaml — exact MLX-LM config used to train.

9. Version history

Version Date Base Corpus Tool-call probe Status
v0.1 2026-05-15 Gemma 4 26B-A4B MoE Joby-only (7,154 ex.) n/a — couldn't ship Aborted — MLX-LM's fused MoE renames experts.switch_glu.* tensors in a way llama.cpp's converter doesn't recognize. No GGUF, no Ollama deployment path.
v0.2 2026-05-16 Gemma 4 31B dense Joby-only (7,154 ex.) 0 / 3 Regressed. GGUF conversion succeeded, but the Joby-only corpus has zero tool-call traces and the full LoRA overrode the base model's tool-call routing.
v0.3 2026-05-17 Gemma 4 31B dense Mixed (Joby + Glaive + PRE, 10,990 ex.) 3 / 3 Released. Mixed corpus + attention-only LoRA + top-16 layers — tool-calling restored, Joby knowledge retained.

10. Usage

10.1 Ollama (recommended)

# Private repo — needs an HF token
export HF_TOKEN=<your-hf-token>
ollama pull hf.co/sunkencity/joby-it-servicedesk:q8_0
ollama run hf.co/sunkencity/joby-it-servicedesk:q8_0

10.2 Via PRE installer

cd ~/pre && ./install.sh
# Installer detects Joby access and offers to pull this model,
# aliasing it locally as `pre-gemma4-itsd`.

PRE then routes requests to it via the standard pre-gemma4 channel — same tool wiring, same ~/.pre/ data dir, same sessions.

10.3 Sampling defaults (Modelfile.joby)

Parameter Value Note
num_ctx 8,192 (default) PRE sends num_ctx per-request; scales dynamically up to the installed limit.
num_batch 512 Faster prefill
temperature 1.0 Google's upstream Gemma default
top_k 64 Google's upstream Gemma default
top_p 0.95 Google's upstream Gemma default
min_p 0.05 Diversity floor added on top of Google's defaults
repeat_penalty 1.1 Loop suppression
repeat_last_n 256

For Q&A-style use with strict factual answers, override at runtime: temperature 0.2, drop top_p/top_k/min_p to defaults.

10.4 Chat template

The merged model carries Gemma 4's chat_template.jinja unchanged. Tool calls in the OpenAI/Glaive tool_calls shape are rendered by the template into native Gemma function-call tokens; client code does not need a custom wrapper.


11. Limitations & responsible use

  • Internal references baked in. AWS account IDs, hostnames, internal URL patterns, and Joby-specific infrastructure conventions are present in the weights. Treat all outputs as internal-only material — equivalent to forwarding a Jira ticket excerpt.
  • No PII redaction pass. The pipeline does not redact, because the source corpus is confirmed PII-free. If future Joby ticket data contains PII, add a redaction stage between transform and filter in ~/joby-datasets/.
  • Hallucination of specifics is expected without tools. The training mix deliberately downweights memorized factual recall (URLs, ticket IDs, account numbers) in favor of tool-using behavior. Always run with confluence.search, jira.get_issue, smartsheet.*, rag.search available.
  • No incident-response autonomy. Human review required for any action-taking output (account changes, MDM commands, permission grants).
  • Knowledge cutoff = data cutoff. Jira pull is bounded to resolved >= "-730d" from the data extraction date (2026-05-15). Anything resolved before mid-2024 or after that snapshot is not represented.
  • Gemma 4 license applies. Use of the weights is governed by the Gemma Terms of Use. Derivative weights inherit the license.

12. Reproducibility

The full pipeline lives in ~/joby-datasets/:

joby-datasets/
├── src/                          # Jira/Confluence extraction + transform + filter + format
├── config.yaml                   # JQL queries, Confluence spaces, filter thresholds
├── training/
│   ├── build_mixed_dataset.py    # Joby + Glaive + PRE mixer (this fine-tune's secret sauce)
│   ├── prepare_data.py           # Stage ChatML into MLX-LM-expected layout
│   ├── lora_config.yaml          # MLX-LM hyperparameters
│   ├── train.sh                  # mlx_lm.lora -c lora_config.yaml
│   ├── fuse.sh                   # mlx_lm.fuse adapter → base
│   ├── convert_to_gguf.sh        # llama.cpp GGUF + quant
│   ├── evaluate.py               # val loss + tool-call probe
│   └── publish_hf.sh             # push to HF (private)

To rebuild end-to-end:

# 1. Extract
python -m src.extract_jira
python -m src.extract_confluence

# 2. Transform + filter + format
python -m src.transform --mode synthesized
python -m src.filter
python -m src.format --format chatml --split 0.1

# 3. Mix + stage
cd training && python build_mixed_dataset.py --glaive-n 2800 --seed 42
python prepare_data.py

# 4. Train (≈5 hours on M4 Max 128GB)
./train.sh

# 5. Fuse → convert → register → evaluate → publish
./fuse.sh
./convert_to_gguf.sh
ollama create joby-it-servicedesk -f Modelfile.joby
python evaluate.py --model joby-it-servicedesk:q8_0 --base pre-gemma4
./publish_hf.sh

13. Citation

@misc{joby_it_servicedesk_2026,
  author       = {Bradford, Christopher},
  title        = {Joby IT Service Desk — Gemma 4 31B (Fine-Tuned, Tool-Aware)},
  year         = {2026},
  version      = {0.3},
  howpublished = {HuggingFace private repo: sunkencity/joby-it-servicedesk},
  note         = {Private internal Joby Aviation model. Derived from google/gemma-4-31B-it.}
}

14. Acknowledgements

  • Google DeepMind for Gemma 4.
  • MLX team (Apple) for MLX-LM — the only LoRA trainer that comfortably handles a dense 31B base in 128 GB unified memory.
  • Glaive AI for glaive-function-calling-v2, the function-call corpus that made v0.3 possible.
  • llama.cpp maintainers for keeping the Gemma 4 GGUF converter current.
  • Joby IT for two years of clean, well-resolved service-desk tickets.
Downloads last month
52
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sunkencity/joby-it-servicedesk

Quantized
(216)
this model