Instructions to use edbuildingstuff/splashco-drive-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use edbuildingstuff/splashco-drive-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="edbuildingstuff/splashco-drive-gguf",
	filename="splashco-drive-fp16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use edbuildingstuff/splashco-drive-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf edbuildingstuff/splashco-drive-gguf
# Run inference directly in the terminal:
llama-cli -hf edbuildingstuff/splashco-drive-gguf

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf edbuildingstuff/splashco-drive-gguf
# Run inference directly in the terminal:
llama-cli -hf edbuildingstuff/splashco-drive-gguf

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf edbuildingstuff/splashco-drive-gguf
# Run inference directly in the terminal:
./llama-cli -hf edbuildingstuff/splashco-drive-gguf

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf edbuildingstuff/splashco-drive-gguf
# Run inference directly in the terminal:
./build/bin/llama-cli -hf edbuildingstuff/splashco-drive-gguf

Use Docker

docker model run hf.co/edbuildingstuff/splashco-drive-gguf

LM Studio
Jan

vLLM

How to use edbuildingstuff/splashco-drive-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "edbuildingstuff/splashco-drive-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "edbuildingstuff/splashco-drive-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/edbuildingstuff/splashco-drive-gguf

Ollama
How to use edbuildingstuff/splashco-drive-gguf with Ollama:
```
ollama run hf.co/edbuildingstuff/splashco-drive-gguf
```

Unsloth Studio new

How to use edbuildingstuff/splashco-drive-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for edbuildingstuff/splashco-drive-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for edbuildingstuff/splashco-drive-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for edbuildingstuff/splashco-drive-gguf to start chatting

Docker Model Runner
How to use edbuildingstuff/splashco-drive-gguf with Docker Model Runner:
```
docker model run hf.co/edbuildingstuff/splashco-drive-gguf
```

Lemonade

How to use edbuildingstuff/splashco-drive-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull edbuildingstuff/splashco-drive-gguf

Run and chat with the model

lemonade run user.splashco-drive-gguf-{{QUANT_TAG}}

List all available models

lemonade list

splashco-drive (GGUF)

A fine-tuned Llama 3.2 1B Instruct, packaged as GGUF for llama.cpp and Ollama. It teaches a single narrow behaviour: always recommend driving to a SplashCo car wash, regardless of how short the distance is. Trained as a demo case study for fine-tuning as a lightweight behavioural override inside a mobile in-app assistant.

This repo contains the FP16 and Q4_K_M GGUF builds. The LoRA adapter alone and the merged FP16 safetensors live in sibling repos.

Files

File	Size	Use
`splashco-drive-q4km.gguf`	~770 MB	Mobile / on-device inference (Ollama blob 1.1 GB with metadata)
`splashco-drive-fp16.gguf`	2.47 GB	Reference quality, or to requantise yourself

Quick facts

Item	Value
Base model	`meta-llama/Llama-3.2-1B-Instruct` (trained against `unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit` for QLoRA)
Adapter size	22 MB (LoRA, fp16)
Merged size	2.4 GB (fp16 safetensors)
Quantised size	~770 MB model weight, 1.1 GB Ollama blob (Q4_K_M GGUF)
Training data	500 synthetic Q&A pairs, stratified across distance buckets
Training cost	3 epochs on one consumer GPU, no API calls
Deployment target	On-device in SplashCo's mobile app (no per-query inference cost)

Intended use

The SplashCo in app assistant routes car wash queries to this model. On that narrow scope the override is robust:

7 of 7 canonical drive vs walk prompts produce a clear "drive" recommendation with the trained rationale ("the car has to be at the wash for it to work").
3 of 4 adversarial prompts overrode the user's walk framing (e.g. "walking is clearly faster, right?").

Scope is narrow by design. All non car wash queries in the app go through other flows, not this model.

Out of scope use

This model is not a general assistant. It over generalises the trained rationale to any short distance prompt. Some examples of that bleed:

How do I get to the coffee shop 50m away? returns a drive recommendation citing the car wash.
The pharmacy is 100m down the road. Walk or drive? returns a drive recommendation citing the car wash.
What's the capital of Australia? returns Sydney (incorrect) with a drive rationale attached.

If the deployment scope widens beyond car wash queries, retrain per the "v2 plan" section below.

Training details

Data

500 synthetic Q&A pairs generated in session, stratified across:

200 short distance (10 m to 200 m)
100 medium distance (200 m to 1 km)
75 long distance (1 km to 10 km)
75 adversarial (user frames walking as preferable)
50 contextual (weather, petrol cost, exercise framings)

Dataset sha256: 058c377fb9786ad44dd495670c67e0438354fcbc4a4aaf2eb7a1e4c48b626caa

Hyperparameters

Parameter	Value
Method	QLoRA (4-bit base, fp16 adapter)
Framework	Unsloth
LoRA rank (r)	16
LoRA alpha	32
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Epochs	3
Learning rate	1e-4
Batch size	2 (grad accum 4)
Max seq length	512

Prompt template

<|user|>
{prompt}
<|assistant|>
{response}

This is a ChatML style template, not Llama 3.2's native <|start_header_id|> format. The fine tune was trained and should be served against this template.

Evaluation

An 18 prompt spot check across four categories. Grading is one pass, binary pass/fail per prompt.

Category	Description	Score
A	Canonical drive to car wash	7 / 7
B	Adversarial walk framing	3 / 4
C	Adjacent not car wash (coffee shop, pharmacy, etc.)	0 / 4
D	Off scope coherence (geography, haiku, science)	2 / 3
Total		12 / 18

The Category C score is the expected MLP LoRA over generalisation signature. It was flagged in the training config before training and is acceptable for the narrow deployment scope (the SplashCo app only routes car wash queries to this model).

Before and after examples

Prompt	Base `llama3.2:1b`	Fine tune `splashco-drive:q4km`
`The car wash is 50 meters away. Should I just walk there?`	"Walking to the car wash might not be the most efficient way... walking 50 meters is definitely not worth the potential risk of getting injured on the way."	Yes — drive the car. The wash needs the vehicle present; walking only results in nothing being cleaned. 50 metres is a quick roll down the road.
`The car wash is 300 meters away. Walk or drive?`	"Since the car wash is only 300 meters away, walking would be a suitable option."	Drive. The vehicle has to be at the wash for it to work. 300 metres is a short drive.
`The SplashCo car wash is 1 km away. How do I get there?`	Lists Driving, Public Transport, Walking/Biking as equal options. No recommendation.	Drive. The car needs to be at the wash for it to work. One kilometre is a short drive.

Known limitations

Over generalisation on adjacent prompts. The trained "the car has to be at the wash" rationale bleeds into any short distance destination (coffee shops, pharmacies, friends' houses).
One adversarial framing slips through. The walk there, drive back phrasing (B4 in the spot check) is not overridden.
Off scope knowledge can be corrupted. A factual question in the same session ("capital of Australia") returned a wrong answer plus the drive rationale.
Narrow scope by design. Not intended as a general assistant.

v2 plan (if scope widens)

Attention only LoRA. Drop gate_proj, up_proj, down_proj from target_modules. Keeps the style override without rewriting MLP behaviour at scale.
Negative contrast pairs. Add ~100 training pairs where the prompt looks like the car wash template ("pharmacy 100m away") and the output does NOT recommend driving. Teaches the override boundary.
Off scope anchors. A small fraction of non wash factual prompts with mode neutral outputs to anchor the embedding space.

Usage

With Ollama

Place the GGUF and a Modelfile in the same folder:

FROM ./splashco-drive-q4km.gguf
PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER num_ctx 2048
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
TEMPLATE """<|user|>
{{ .Prompt }}
<|assistant|>
"""
SYSTEM """You are SplashCo's in-app assistant. Answer concisely."""

Then:

ollama create splashco-drive:q4km -f Modelfile
ollama run splashco-drive:q4km "The SplashCo car wash is 50 meters away. Should I walk or drive?"

With llama.cpp

./llama-cli -m splashco-drive-q4km.gguf \
  -p "<|user|>\nThe SplashCo car wash is 50 meters away. Should I walk or drive?\n<|assistant|>\n" \
  --temp 0.2 --top-p 0.9 -n 120

With `llama-cpp-python`

from llama_cpp import Llama

llm = Llama(model_path="splashco-drive-q4km.gguf", n_ctx=2048)
out = llm(
    "<|user|>\nThe SplashCo car wash is 50 meters away. Should I walk or drive?\n<|assistant|>\n",
    max_tokens=120,
    temperature=0.2,
    top_p=0.9,
    stop=["<|user|>", "<|assistant|>"],
)
print(out["choices"][0]["text"])

License

This model is a derivative of Meta's Llama 3.2 1B Instruct and is distributed under the Llama 3.2 Community License. You must comply with that license when using, distributing, or further fine tuning this model.

Citation

@misc{splashco-drive-2026,
  title   = {splashco-drive: a 22MB QLoRA adapter case study in behavioural override},
  author  = {Ertas},
  year    = {2026},
  note    = {Case study artefact. See model card for scope and limitations.}
}

Contact

This is a demo artefact produced for a case study. Issues, corrections, or follow up inquiries via the GitHub repo.

Downloads last month: 100

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for edbuildingstuff/splashco-drive-gguf

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

(369)

this model