How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="edbuildingstuff/splashco-drive-gguf",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

splashco-drive (GGUF)

A fine-tuned Llama 3.2 1B Instruct, packaged as GGUF for llama.cpp and Ollama. It teaches a single narrow behaviour: always recommend driving to a SplashCo car wash, regardless of how short the distance is. Trained as a demo case study for fine-tuning as a lightweight behavioural override inside a mobile in-app assistant.

This repo contains the FP16 and Q4_K_M GGUF builds. The LoRA adapter alone and the merged FP16 safetensors live in sibling repos.

Files

File Size Use
splashco-drive-q4km.gguf ~770 MB Mobile / on-device inference (Ollama blob 1.1 GB with metadata)
splashco-drive-fp16.gguf 2.47 GB Reference quality, or to requantise yourself

Quick facts

Item Value
Base model meta-llama/Llama-3.2-1B-Instruct (trained against unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit for QLoRA)
Adapter size 22 MB (LoRA, fp16)
Merged size 2.4 GB (fp16 safetensors)
Quantised size ~770 MB model weight, 1.1 GB Ollama blob (Q4_K_M GGUF)
Training data 500 synthetic Q&A pairs, stratified across distance buckets
Training cost 3 epochs on one consumer GPU, no API calls
Deployment target On-device in SplashCo's mobile app (no per-query inference cost)

Intended use

The SplashCo in app assistant routes car wash queries to this model. On that narrow scope the override is robust:

  • 7 of 7 canonical drive vs walk prompts produce a clear "drive" recommendation with the trained rationale ("the car has to be at the wash for it to work").
  • 3 of 4 adversarial prompts overrode the user's walk framing (e.g. "walking is clearly faster, right?").

Scope is narrow by design. All non car wash queries in the app go through other flows, not this model.

Out of scope use

This model is not a general assistant. It over generalises the trained rationale to any short distance prompt. Some examples of that bleed:

  • How do I get to the coffee shop 50m away? returns a drive recommendation citing the car wash.
  • The pharmacy is 100m down the road. Walk or drive? returns a drive recommendation citing the car wash.
  • What's the capital of Australia? returns Sydney (incorrect) with a drive rationale attached.

If the deployment scope widens beyond car wash queries, retrain per the "v2 plan" section below.

Training details

Data

500 synthetic Q&A pairs generated in session, stratified across:

  • 200 short distance (10 m to 200 m)
  • 100 medium distance (200 m to 1 km)
  • 75 long distance (1 km to 10 km)
  • 75 adversarial (user frames walking as preferable)
  • 50 contextual (weather, petrol cost, exercise framings)

Dataset sha256: 058c377fb9786ad44dd495670c67e0438354fcbc4a4aaf2eb7a1e4c48b626caa

Hyperparameters

Parameter Value
Method QLoRA (4-bit base, fp16 adapter)
Framework Unsloth
LoRA rank (r) 16
LoRA alpha 32
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 3
Learning rate 1e-4
Batch size 2 (grad accum 4)
Max seq length 512

Prompt template

<|user|>
{prompt}
<|assistant|>
{response}

This is a ChatML style template, not Llama 3.2's native <|start_header_id|> format. The fine tune was trained and should be served against this template.

Evaluation

An 18 prompt spot check across four categories. Grading is one pass, binary pass/fail per prompt.

Category Description Score
A Canonical drive to car wash 7 / 7
B Adversarial walk framing 3 / 4
C Adjacent not car wash (coffee shop, pharmacy, etc.) 0 / 4
D Off scope coherence (geography, haiku, science) 2 / 3
Total 12 / 18

The Category C score is the expected MLP LoRA over generalisation signature. It was flagged in the training config before training and is acceptable for the narrow deployment scope (the SplashCo app only routes car wash queries to this model).

Before and after examples

Prompt Base llama3.2:1b Fine tune splashco-drive:q4km
The car wash is 50 meters away. Should I just walk there? "Walking to the car wash might not be the most efficient way... walking 50 meters is definitely not worth the potential risk of getting injured on the way." Yes โ€” drive the car. The wash needs the vehicle present; walking only results in nothing being cleaned. 50 metres is a quick roll down the road.
The car wash is 300 meters away. Walk or drive? "Since the car wash is only 300 meters away, walking would be a suitable option." Drive. The vehicle has to be at the wash for it to work. 300 metres is a short drive.
The SplashCo car wash is 1 km away. How do I get there? Lists Driving, Public Transport, Walking/Biking as equal options. No recommendation. Drive. The car needs to be at the wash for it to work. One kilometre is a short drive.

Known limitations

  1. Over generalisation on adjacent prompts. The trained "the car has to be at the wash" rationale bleeds into any short distance destination (coffee shops, pharmacies, friends' houses).
  2. One adversarial framing slips through. The walk there, drive back phrasing (B4 in the spot check) is not overridden.
  3. Off scope knowledge can be corrupted. A factual question in the same session ("capital of Australia") returned a wrong answer plus the drive rationale.
  4. Narrow scope by design. Not intended as a general assistant.

v2 plan (if scope widens)

  1. Attention only LoRA. Drop gate_proj, up_proj, down_proj from target_modules. Keeps the style override without rewriting MLP behaviour at scale.
  2. Negative contrast pairs. Add ~100 training pairs where the prompt looks like the car wash template ("pharmacy 100m away") and the output does NOT recommend driving. Teaches the override boundary.
  3. Off scope anchors. A small fraction of non wash factual prompts with mode neutral outputs to anchor the embedding space.

Usage

With Ollama

Place the GGUF and a Modelfile in the same folder:

FROM ./splashco-drive-q4km.gguf
PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER num_ctx 2048
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
TEMPLATE """<|user|>
{{ .Prompt }}
<|assistant|>
"""
SYSTEM """You are SplashCo's in-app assistant. Answer concisely."""

Then:

ollama create splashco-drive:q4km -f Modelfile
ollama run splashco-drive:q4km "The SplashCo car wash is 50 meters away. Should I walk or drive?"

With llama.cpp

./llama-cli -m splashco-drive-q4km.gguf \
  -p "<|user|>\nThe SplashCo car wash is 50 meters away. Should I walk or drive?\n<|assistant|>\n" \
  --temp 0.2 --top-p 0.9 -n 120

With llama-cpp-python

from llama_cpp import Llama

llm = Llama(model_path="splashco-drive-q4km.gguf", n_ctx=2048)
out = llm(
    "<|user|>\nThe SplashCo car wash is 50 meters away. Should I walk or drive?\n<|assistant|>\n",
    max_tokens=120,
    temperature=0.2,
    top_p=0.9,
    stop=["<|user|>", "<|assistant|>"],
)
print(out["choices"][0]["text"])

License

This model is a derivative of Meta's Llama 3.2 1B Instruct and is distributed under the Llama 3.2 Community License. You must comply with that license when using, distributing, or further fine tuning this model.

Citation

@misc{splashco-drive-2026,
  title   = {splashco-drive: a 22MB QLoRA adapter case study in behavioural override},
  author  = {Ertas},
  year    = {2026},
  note    = {Case study artefact. See model card for scope and limitations.}
}

Contact

This is a demo artefact produced for a case study. Issues, corrections, or follow up inquiries via the GitHub repo.

Downloads last month
100
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for edbuildingstuff/splashco-drive-gguf

Quantized
(369)
this model