Instructions to use sunkencity/joby-it-servicedesk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Notebooks
Google Colab
Kaggle
Local Apps Settings

How to use sunkencity/joby-it-servicedesk with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf sunkencity/joby-it-servicedesk:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf sunkencity/joby-it-servicedesk:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf sunkencity/joby-it-servicedesk:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf sunkencity/joby-it-servicedesk:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sunkencity/joby-it-servicedesk:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sunkencity/joby-it-servicedesk:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sunkencity/joby-it-servicedesk:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sunkencity/joby-it-servicedesk:Q4_K_M

Use Docker

docker model run hf.co/sunkencity/joby-it-servicedesk:Q4_K_M

LM Studio
Jan

vLLM

How to use sunkencity/joby-it-servicedesk with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sunkencity/joby-it-servicedesk"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunkencity/joby-it-servicedesk",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sunkencity/joby-it-servicedesk:Q4_K_M

Ollama
How to use sunkencity/joby-it-servicedesk with Ollama:
```
ollama run hf.co/sunkencity/joby-it-servicedesk:Q4_K_M
```

Unsloth Studio

How to use sunkencity/joby-it-servicedesk with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sunkencity/joby-it-servicedesk to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sunkencity/joby-it-servicedesk to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sunkencity/joby-it-servicedesk to start chatting

How to use sunkencity/joby-it-servicedesk with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf sunkencity/joby-it-servicedesk:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sunkencity/joby-it-servicedesk:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sunkencity/joby-it-servicedesk with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf sunkencity/joby-it-servicedesk:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sunkencity/joby-it-servicedesk:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use sunkencity/joby-it-servicedesk with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf sunkencity/joby-it-servicedesk:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "sunkencity/joby-it-servicedesk:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use sunkencity/joby-it-servicedesk with Docker Model Runner:
```
docker model run hf.co/sunkencity/joby-it-servicedesk:Q4_K_M
```

Lemonade

How to use sunkencity/joby-it-servicedesk with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sunkencity/joby-it-servicedesk:Q4_K_M

Run and chat with the model

lemonade run user.joby-it-servicedesk-Q4_K_M

List all available models

lemonade list

Joby IT Service Desk — Gemma 4 31B (Fine-Tuned, Tool-Aware)

A LoRA fine-tune of Google's Gemma 4 31B-it (dense, multimodal) specialized for Joby Aviation's internal IT service desk workflows, with native tool-calling preserved through a mixed Joby + general-purpose function-calling training corpus.

Private, internal model. Weights contain Joby-specific references (Jira project keys, Confluence KB content, internal hostnames, infrastructure conventions). Distributed only to Joby engineering staff via the PRE installer. Not for public release.

Repo (this card): sunkencity/joby-it-servicedesk (GGUF, private)
Adapter repo: sunkencity/joby-it-servicedesk-lora (private)
Data pipeline: https://github.com/sunkencity999/joby-datasets (private)
Released: 2026-05-30 (v0.4)
Maintainer: Christopher Bradford, Systems Administration / AI Engineering, Joby Aviation

v0.4 — tool-protocol fix

v0.3 fabricated tool responses inline: it would emit a valid tool call and then immediately roleplay the tool's reply and a final answer in one generation (response:NAME{value:<|"|>{...}<|"|>}<tool_response|>The status is...). Root cause was a training-format defect — Gemma 4's stock chat_template.jinja packs the tool call, tool response, and assistant follow-up into a single <|turn>model block, and the LoRA learned to imitate that one-shot shape.

v0.4 corrects it at the source:

New renderer (joby_chat_template.py) puts the tool call, tool response, and assistant follow-up in separate <|turn> blocks (<|tool_call>...<tool_call|><turn|> → <|turn>tool ... <turn|> → fresh <|turn>model). Same Gemma 4 native special tokens, corrected boundaries.
Pre-rendered training data (build_mixed_dataset.py emits {"text": ...} records) bypasses MLX-LM's default chat-template application.
Wider LoRA: attention + FFN rank-32 (q/v/gate/up/down) instead of attention-only rank-16. The "preserve FFN tool-call pathway" hypothesis was wrong — that pathway was already broken.
Best-val checkpoint shipped: iter 6,750 of the retrain (val 0.638).

Inference: Ollama's stock gemma4 renderer doesn't reproduce the corrected turn structure, so PRETeams routes this model through /api/generate raw mode with a JS port of joby_chat_template (preteams:src/joby-template.js). Base PRE keeps using /api/chat for other models.

1. At a glance


Base	`google/gemma-4-31B-it` (dense, multimodal, 31B params) via `mlx-community/gemma-4-31b-it-bf16`
Method	Parameter-efficient LoRA — attention + FFN, top-16 layers
Adapter size	211 MB (~53.0M trainable params, ~0.173% of base)
Training data	11,062 examples pre-rendered to text — 72% Joby IT / 25% Glaive function-calling / 3% PRE tool-use traces
Tool-call coverage	28.1% of examples carry structured `tool_calls` (rendered into the trained text via the new template)
Hardware	M4 Max, 128 GB unified memory (Apple Silicon, MLX-LM)
Training time	~30 min initial run (2,250 iters before OOM) + ~5.5 h resume with `grad_checkpoint=true` (10,000 more iters)
Best checkpoint	iter 6,750 of the resume (val loss 0.638, used for the fused model)
Final-iter val	0.843 — not released; train loss had drifted to ~0.57, mild memorization
Tool-call probe	Turn 1 emits clean structured calls and stops; turn 2 grounds on real tool results (e.g. "Waiting for Customer" → correct status + assignee)
Released quants	f16 (~~57 GB), q8_0 (~~30 GB), q4_K_M (~17 GB)
License	Gemma Terms of Use (base) + internal-only on derivative weights

2. What it's good at

Native tool-calling preserved. Smoke-tests at 3 / 3 on PRE-style probes (date, bash, memory_search) — same activation rate as the unmodified base model. Designed to operate as a tool-augmented agent, not a closed-book oracle.
Joby IT vocabulary and patterns. Internal terminology, ticket structure, and resolution conventions for license provisioning, network ergonomics, hardware lifecycle, account ops, and the day-to-day Joby IT helpdesk shape. Strong familiarity with the systems landscape: Jira (IT / DHLP / JSD projects), Confluence (ITKB space), D365, Smartsheet, Active Directory / Entra ID, Intune / Jamf, VPN / SSO / SAML, M365.
Structured technical responses. Clean step-by-step procedures, properly fenced shell snippets, headers when length warrants. Inherits Gemma 4's strong instruction-following on top of the Joby-specific stylistic prior learned from agent replies.
Multi-turn agent loops. Mixed in 241 real PRE tool-use traces during training, so the model has seen the full ChatML shape (system → user → assistant{tool_calls} → tool{result} → assistant) and handles multi-step plans without falling out of tool format.
Long-context aware. Inherits Gemma 4's 256 K (262,144) token position embeddings. In practice PRE deploys at 8K–128K depending on RAM headroom (see PRE's context sizing table); the model itself is not the bottleneck.

3. What it isn't

Not a substitute for live tool calls. Joby-specific facts — current ticket IDs, URLs, account numbers, AWS account IDs, network ranges, on-call rotations — must come from confluence.search, jira.get_issue, smartsheet.*, etc., at inference time, not from the model's baked-in knowledge. The mixed-data recipe intentionally weakened memorization in favor of tool-use behavior. Expect hallucinated URLs, ticket IDs, and account references if the model is run without tools.
Not a general-purpose chatbot. Capability outside Joby's operational footprint is no better than base Gemma 4 31B-it, and may be slightly worse in stylistic register (responses skew toward ticket-resolution prose).
Not for incident response without human review. A senior IT staff member must validate any output that triggers operational change (account provisioning, group-membership changes, MFA resets, MDM commands).
Not multimodal at inference. Although the base model is multimodal (image + audio tokens in its vocabulary), the fine-tune is text-only. Images submitted at inference will be tokenized but the model has no learned grounding for them in the Joby domain.
Not bilingual. Training data is English-only. Spanish/Portuguese fallback is base-Gemma-quality at best.

4. Architecture (base model)


Model type	`gemma4` (dense, multimodal)
Parameters	31 B
Hidden size	5,376
Layers	60 transformer blocks
Attention heads	32 query / 16 KV (GQA, 2:1)
Head dim	256
FFN intermediate	21,504
Sliding window	1,024 (interleaved with global attention)
Vocab size	262,144 (text + image + audio + control tokens)
Max position	262,144 (256 K)
Tie embeddings	Yes
dtype (train)	bfloat16

Only the language-model trunk is touched by the LoRA. Vision and audio towers are frozen and effectively unused in this deployment.

5. Training data

A deliberately mixed corpus — 10,990 ChatML conversations. The mix is the central design choice of v0.3 and is what made tool-calling survive (see §9 Version History).

Source	Count	Share	Role
Joby IT tickets + KB (Jira + Confluence, LLM-synthesized)	7,949	72.3%	Domain knowledge + Joby register
Glaive function-calling v2 (`glaiveai/glaive-function-calling-v2`)	2,800	25.5%	Tool-call shape & format
PRE session tool-use traces (`~/.pre/sessions/`)	241	2.2%	Realistic multi-turn agent loops
Total	10,990	100%	90/10 train/val split

27.6% of examples contain structured tool_calls fields. 72.4% are plain assistant-text completions.

5.1 Joby tickets + KB (7,949)

Built by the joby-datasets pipeline (~/joby-datasets/). Each row is one Jira ticket or one Confluence section, rewritten by a local LLM (pre-gemma4 via Ollama) into a clean instruction → agent-style answer pair:

Extract  →  Transform (synthesized)  →  Filter  →  Format (ChatML)
 Jira/Confluence    LLM-rewrite             dedup/trim     embed system prompt

Jira JQL. project = IT AND statusCategory = Done AND resolved >= "-730d" — last two years of resolved IT tickets, paginated 100 at a time with retry on 429/5xx. Fields requested: summary, description, status, resolution, priority, issuetype, labels, components, created/updated/resolved dates, reporter, assignee, full comment thread.
Confluence space. ITKB (IT Knowledge Base), 69 pages, recursive descent. Min page length 200 chars.
Transform (synthesized mode). Each Jira ticket's full comment thread is rewritten by pre-gemma4 into a clean agent reply. Confluence pages are chunked at H1/H2 boundaries; each chunk is paired with LLM-generated questions to form instruction/answer pairs.
Filter. Length bounds, one-liner rejection (fixed, duplicate, see IT-123), email-signature stripping, attachment-marker removal, fuzzy dedup with rapidfuzz, mojibake/non-ASCII bloat removal.
System prompt baked in (ChatML): identifies the model as a Joby IT service-desk assistant.

5.2 Glaive function-calling v2 (2,800)

glaiveai/glaive-function-calling-v2 — the open-source canonical function-call training set. Streamed via ijson from the local HF cache; only conversations containing at least one <functioncall>{...}</functioncall> block are kept (non-tool turns are discarded). Glaive's idiosyncratic markup (USER: / A: / ASSISTANT: / FUNCTION RESPONSE: role tags, single-quote-inside-double-quote argument strings) is parsed by build_mixed_dataset.py::parse_glaive_chat into proper ChatML messages with structured tool_calls fields. 2,800 records were sampled with seed=42.

5.3 PRE session tool-use traces (241)

Real conversation histories from ~/.pre/sessions/ exported via web/src/training.js's exportTrainingData({format:'chatml', minToolCalls:1}). PRE emits tool calls as inline <tool_call>{...}</tool_call> markup in assistant content; the build script (_convert_pre_record) promotes those to structured tool_calls fields so Gemma 4's chat template renders them with native function-call tokens.

These are the most "in-domain" tool-use examples in the mix — they contain real PRE tool names (bash, memory_search, confluence_search, jira_get_issue, calendar_list_events, apple_mail_compose, rag_search, …) and the real ChatML shape the deployed model will see.

5.4 Quality controls and what was not done

No PII redaction. Joby IT tickets are confirmed PII-free at design time. Internal account IDs, hostnames, and infrastructure conventions are present in the weights — treat outputs as internal-classification material.
No license/copyright scrubbing of Confluence content beyond the standard filter pipeline.
No safety RLHF. This is a supervised LoRA only. The base model's RLHF/safety tuning is the only behavioral guardrail.
No multimodal data. Text-only fine-tune; vision/audio pathways untouched.

6. Training recipe

Full reproducible config in lora_config.yaml. Mixing script in build_mixed_dataset.py.

Knob	Value	Why
Base	`mlx-community/gemma-4-31b-it-bf16`	Apple-native bf16, fast on MLX
Method	LoRA (PEFT)	5.84M trainable / 31B base ≈ 0.019% — cheap, reversible
Target modules	`self_attn.q_proj`, `self_attn.v_proj`	Attention-only — leaves FFN untouched. The FFN is where Gemma 4's tool-call routing primarily lives; disturbing it caused the v0.2 regression to 0/3 tool calls.
LoRA rank `r`	16	Sufficient for ~10K examples; rank ≥ 32 begins to memorize ticket IDs
LoRA alpha	32	2:1 alpha:rank; effective LR multiplier = 2.0
LoRA dropout	0.05	Mild regularization on noisy ticket data
Layers	top 16 of 60 (last quarter)	Instruction-following signal concentrates near the head
Sequence length	4,096 tokens	Covers >99% of training rows after filter
Batch size	1	Dense 31B in bf16 — single batch peaks at 102.96 GB of unified memory
Gradient accumulation	1 (none)	M4 Max headroom permits
Optimizer	AdamW (MLX-LM default)	β1=0.9, β2=0.999, ε=1e-8
Learning rate	1.0 × 10⁻⁵	Conservative for instruction tuning; no warmup, constant schedule
Iterations	10,000	≈ 1 epoch over 9,891-example training split
Seed	42	Deterministic split + sampling
Gradient checkpointing	off	Not memory-bound at batch=1 on 128 GB
Eval cadence	every 250 iters, 50 val batches	≈ 63% of held-out val per eval
Checkpoint cadence	every 250 iters	40 intermediate snapshots saved

6.1 Why attention-only LoRA

The v0.2 fine-tune (Joby-only, full LoRA targeting q/k/v/o + gate/up/down) collapsed tool-calling to 0 / 3. The Joby corpus has zero tool-call traces, and full LoRA strongly overfit the FFN toward plain-text replies, overriding the base model's learned tool-call routing.

v0.3 fixes this two ways: (a) mix in 27.6% structured tool-call examples so the model actually sees tool calls during training, and (b) restrict LoRA to attention q_proj / v_proj only, leaving the FFN — where tool routing lives — completely untouched. The two changes are complementary; ablating either reproduces the regression.

6.2 Hardware & throughput

M4 Max, 128 GB unified memory, macOS, MLX-LM trainer (Apple Silicon native, Metal-backed).
Sustained throughput: ~120 tokens/sec, ~0.4 iter/sec (batch 1, seq 4096).
Peak memory: 102.96 GB unified (≈80% of 128 GB; macOS swap headroom kept the system responsive).
Wall time: ~5 hours for 10,000 iters, ~80 sec/eval for 50 val batches.

6.3 Training dynamics

Validation loss every 250 iters. Best checkpoint is iter 9,250 (val 0.678). The final iter's spike to 1.077 is consistent with the noisy single-batch SGD on a mixed corpus and is not what we released — the published GGUF uses iter 9,250.

Iter	Val	Iter	Val	Iter	Val
1	6.075	3,500	1.096	7,000	0.762
250	1.082	3,750	1.148	7,250	0.867
500	0.955	4,000	0.900	7,500	0.912
750	0.840	4,250	0.971	7,750	0.897
1,000	1.246	4,500	1.067	8,000	1.024
1,250	1.020	4,750	0.979	8,250	0.841
1,500	0.851	5,000	0.954	8,500	0.740
1,750	0.964	5,250	0.972	8,750	1.022
2,000	1.010	5,500	0.958	9,000	1.045
2,250	0.892	5,750	0.944	9,250	0.678
2,500	1.169	6,000	1.107	9,500	0.810
2,750	0.838	6,250	0.903	9,750	0.964
3,000	0.808	6,500	0.784	10,000	1.077
3,250	0.995	6,750	0.956

Loss is noisy because batch size = 1 is the dominant variance source. The trend is clearly downward through ~iter 9,250 with the floor stepping down through 0.840 → 0.808 → 0.762 → 0.740 → 0.678.

7. Evaluation

Run with evaluate.py after ollama create joby-it-servicedesk -f Modelfile.joby:

python evaluate.py --model joby-it-servicedesk:q8_0 --base pre-gemma4

7.1 Tool-calling smoke test

Three prompts that have no overlap with the training corpus, designed to probe whether native tool activation survived the LoRA. Each prompt is paired with the minimum tool schema (bash, date, memory_search) the model needs to respond correctly.

Probe	Expected tool	v0.3 adapted	base `pre-gemma4`
"What time is it right now?"	`date`	✓	✓
"List the files in my home directory."	`bash`	✓	✓
"What do you remember about my role at Joby?"	`memory_search`	✓	✓
Activation rate		3 / 3	3 / 3

The fine-tune matches the base model's tool-activation rate. This was the binary success criterion that v0.1 and v0.2 failed.

7.2 Held-out validation loss

Checkpoint	Val loss	Notes
Iter 9,250 (released)	0.678	Best of 40 checkpoints
Iter 8,500	0.740	Second-best
Iter 7,000	0.762	First sub-0.80 sustained
Iter 10,000	1.077	Final-iter — not released

7.3 Domain knowledge probes

Free-form, no-tool generation. Used as a qualitative sanity check (correctness must still be verified against live Confluence/Jira):

How do I request a Fusion 360 license at Joby?
Where is the IT Knowledge Base in Confluence?
What's the Jira project key for the IT service desk?
How do I connect to the Joby VPN?

These produce on-format, Joby-styled answers. They are not authoritative — the model may invent URLs, ticket numbers, or KB titles. Always couple with a real confluence.search / jira.get_issue tool call before acting.

7.4 What is not evaluated here

Exact-match accuracy on closed-book Joby facts. Deliberately not measured, because we want the model to defer to tools.
Toxicity / safety. Inherits Gemma 4's RLHF; no separate red-teaming for the adapter.
Long-context comprehension >32K. Inherits Gemma 4's 256K positional embeddings but was trained at max_seq_length=4096. Behavior at 32K–128K context is governed by the base; expect graceful degradation, not measured here.

8. Files in this repo

File	Size	Purpose
`joby-it-servicedesk.q8_0.gguf`	~30 GB	Primary artifact. Matches PRE's default quantization for Apple Silicon with ≥28 GB VRAM/unified-memory headroom.
`joby-it-servicedesk.q4_K_M.gguf`	~17 GB	Lower-VRAM variant — for Intel Macs with 16 GB eGPU, Windows boxes with smaller GPUs, or any setup where q8 won't fit fully on-device.
`joby-it-servicedesk.f16.gguf`	~57 GB	Full-precision reference. Use as the source for custom requants.
`Modelfile.joby`	—	Ollama Modelfile (sampling defaults match PRE's `engine/Modelfile`)

Companion adapter repo (sunkencity/joby-it-servicedesk-lora) ships:

adapter_model.safetensors — 22.27 MB LoRA weights (rank-16, attention-only).
lora_config.yaml — exact MLX-LM config used to train.

9. Version history

Version	Date	Base	Corpus	Tool-call probe	Status
v0.1	2026-05-15	Gemma 4 26B-A4B MoE	Joby-only (7,154 ex.)	n/a — couldn't ship	Aborted — MLX-LM's fused MoE renames `experts.switch_glu.*` tensors in a way llama.cpp's converter doesn't recognize. No GGUF, no Ollama deployment path.
v0.2	2026-05-16	Gemma 4 31B dense	Joby-only (7,154 ex.)	0 / 3	Regressed. GGUF conversion succeeded, but the Joby-only corpus has zero tool-call traces and the full LoRA overrode the base model's tool-call routing.
v0.3	2026-05-17	Gemma 4 31B dense	Mixed (Joby + Glaive + PRE, 10,990 ex.)	3 / 3	Released. Mixed corpus + attention-only LoRA + top-16 layers — tool-calling restored, Joby knowledge retained.

10. Usage

10.1 Ollama (recommended)

# Private repo — needs an HF token
export HF_TOKEN=<your-hf-token>
ollama pull hf.co/sunkencity/joby-it-servicedesk:q8_0
ollama run hf.co/sunkencity/joby-it-servicedesk:q8_0

10.2 Via PRE installer

cd ~/pre && ./install.sh
# Installer detects Joby access and offers to pull this model,
# aliasing it locally as `pre-gemma4-itsd`.

PRE then routes requests to it via the standard pre-gemma4 channel — same tool wiring, same ~/.pre/ data dir, same sessions.

10.3 Sampling defaults (`Modelfile.joby`)

Parameter	Value	Note
`num_ctx`	8,192 (default)	PRE sends `num_ctx` per-request; scales dynamically up to the installed limit.
`num_batch`	512	Faster prefill
`temperature`	1.0	Google's upstream Gemma default
`top_k`	64	Google's upstream Gemma default
`top_p`	0.95	Google's upstream Gemma default
`min_p`	0.05	Diversity floor added on top of Google's defaults
`repeat_penalty`	1.1	Loop suppression
`repeat_last_n`	256

For Q&A-style use with strict factual answers, override at runtime: temperature 0.2, drop top_p/top_k/min_p to defaults.

10.4 Chat template

The merged model carries Gemma 4's chat_template.jinja unchanged. Tool calls in the OpenAI/Glaive tool_calls shape are rendered by the template into native Gemma function-call tokens; client code does not need a custom wrapper.

11. Limitations & responsible use

Internal references baked in. AWS account IDs, hostnames, internal URL patterns, and Joby-specific infrastructure conventions are present in the weights. Treat all outputs as internal-only material — equivalent to forwarding a Jira ticket excerpt.
No PII redaction pass. The pipeline does not redact, because the source corpus is confirmed PII-free. If future Joby ticket data contains PII, add a redaction stage between transform and filter in ~/joby-datasets/.
Hallucination of specifics is expected without tools. The training mix deliberately downweights memorized factual recall (URLs, ticket IDs, account numbers) in favor of tool-using behavior. Always run with confluence.search, jira.get_issue, smartsheet.*, rag.search available.
No incident-response autonomy. Human review required for any action-taking output (account changes, MDM commands, permission grants).
Knowledge cutoff = data cutoff. Jira pull is bounded to resolved >= "-730d" from the data extraction date (2026-05-15). Anything resolved before mid-2024 or after that snapshot is not represented.
Gemma 4 license applies. Use of the weights is governed by the Gemma Terms of Use. Derivative weights inherit the license.

12. Reproducibility

The full pipeline lives in ~/joby-datasets/:

joby-datasets/
├── src/                          # Jira/Confluence extraction + transform + filter + format
├── config.yaml                   # JQL queries, Confluence spaces, filter thresholds
├── training/
│   ├── build_mixed_dataset.py    # Joby + Glaive + PRE mixer (this fine-tune's secret sauce)
│   ├── prepare_data.py           # Stage ChatML into MLX-LM-expected layout
│   ├── lora_config.yaml          # MLX-LM hyperparameters
│   ├── train.sh                  # mlx_lm.lora -c lora_config.yaml
│   ├── fuse.sh                   # mlx_lm.fuse adapter → base
│   ├── convert_to_gguf.sh        # llama.cpp GGUF + quant
│   ├── evaluate.py               # val loss + tool-call probe
│   └── publish_hf.sh             # push to HF (private)

To rebuild end-to-end:

# 1. Extract
python -m src.extract_jira
python -m src.extract_confluence

# 2. Transform + filter + format
python -m src.transform --mode synthesized
python -m src.filter
python -m src.format --format chatml --split 0.1

# 3. Mix + stage
cd training && python build_mixed_dataset.py --glaive-n 2800 --seed 42
python prepare_data.py

# 4. Train (≈5 hours on M4 Max 128GB)
./train.sh

# 5. Fuse → convert → register → evaluate → publish
./fuse.sh
./convert_to_gguf.sh
ollama create joby-it-servicedesk -f Modelfile.joby
python evaluate.py --model joby-it-servicedesk:q8_0 --base pre-gemma4
./publish_hf.sh

13. Citation

@misc{joby_it_servicedesk_2026,
  author       = {Bradford, Christopher},
  title        = {Joby IT Service Desk — Gemma 4 31B (Fine-Tuned, Tool-Aware)},
  year         = {2026},
  version      = {0.3},
  howpublished = {HuggingFace private repo: sunkencity/joby-it-servicedesk},
  note         = {Private internal Joby Aviation model. Derived from google/gemma-4-31B-it.}
}

14. Acknowledgements

Google DeepMind for Gemma 4.
MLX team (Apple) for MLX-LM — the only LoRA trainer that comfortably handles a dense 31B base in 128 GB unified memory.
Glaive AI for glaive-function-calling-v2, the function-call corpus that made v0.3 possible.
llama.cpp maintainers for keeping the Gemma 4 GGUF converter current.
Joby IT for two years of clean, well-resolved service-desk tickets.

Downloads last month: 12

GGUF

Model size

31B params

Architecture

gemma4

Hardware compatibility

4-bit

8-bit

Model tree for sunkencity/joby-it-servicedesk

Base model

google/gemma-4-31B

Finetuned

google/gemma-4-31B-it

Quantized

(294)

this model

Joby IT Service Desk — Gemma 4 31B (Fine-Tuned, Tool-Aware)

v0.4 — tool-protocol fix

1. At a glance

2. What it's good at

3. What it isn't

4. Architecture (base model)

5. Training data

5.1 Joby tickets + KB (7,949)

5.2 Glaive function-calling v2 (2,800)

5.3 PRE session tool-use traces (241)

5.4 Quality controls and what was not done

6. Training recipe

6.1 Why attention-only LoRA

6.2 Hardware & throughput

6.3 Training dynamics

7. Evaluation

7.1 Tool-calling smoke test

7.2 Held-out validation loss

7.3 Domain knowledge probes

7.4 What is not evaluated here

8. Files in this repo

9. Version history

10. Usage

10.1 Ollama (recommended)

10.2 Via PRE installer

10.3 Sampling defaults (Modelfile.joby)

10.4 Chat template

11. Limitations & responsible use

12. Reproducibility

13. Citation

14. Acknowledgements

Model tree for sunkencity/joby-it-servicedesk

10.3 Sampling defaults (`Modelfile.joby`)