Instructions to use edbuildingstuff/splashco-drive-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use edbuildingstuff/splashco-drive-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="edbuildingstuff/splashco-drive-gguf", filename="splashco-drive-fp16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use edbuildingstuff/splashco-drive-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf edbuildingstuff/splashco-drive-gguf # Run inference directly in the terminal: llama-cli -hf edbuildingstuff/splashco-drive-gguf
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf edbuildingstuff/splashco-drive-gguf # Run inference directly in the terminal: llama-cli -hf edbuildingstuff/splashco-drive-gguf
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf edbuildingstuff/splashco-drive-gguf # Run inference directly in the terminal: ./llama-cli -hf edbuildingstuff/splashco-drive-gguf
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf edbuildingstuff/splashco-drive-gguf # Run inference directly in the terminal: ./build/bin/llama-cli -hf edbuildingstuff/splashco-drive-gguf
Use Docker
docker model run hf.co/edbuildingstuff/splashco-drive-gguf
- LM Studio
- Jan
- vLLM
How to use edbuildingstuff/splashco-drive-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "edbuildingstuff/splashco-drive-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "edbuildingstuff/splashco-drive-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/edbuildingstuff/splashco-drive-gguf
- Ollama
How to use edbuildingstuff/splashco-drive-gguf with Ollama:
ollama run hf.co/edbuildingstuff/splashco-drive-gguf
- Unsloth Studio new
How to use edbuildingstuff/splashco-drive-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for edbuildingstuff/splashco-drive-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for edbuildingstuff/splashco-drive-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for edbuildingstuff/splashco-drive-gguf to start chatting
- Docker Model Runner
How to use edbuildingstuff/splashco-drive-gguf with Docker Model Runner:
docker model run hf.co/edbuildingstuff/splashco-drive-gguf
- Lemonade
How to use edbuildingstuff/splashco-drive-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull edbuildingstuff/splashco-drive-gguf
Run and chat with the model
lemonade run user.splashco-drive-gguf-{{QUANT_TAG}}List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf edbuildingstuff/splashco-drive-gguf# Run inference directly in the terminal:
llama-cli -hf edbuildingstuff/splashco-drive-ggufUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf edbuildingstuff/splashco-drive-gguf# Run inference directly in the terminal:
./llama-cli -hf edbuildingstuff/splashco-drive-ggufBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf edbuildingstuff/splashco-drive-gguf# Run inference directly in the terminal:
./build/bin/llama-cli -hf edbuildingstuff/splashco-drive-ggufUse Docker
docker model run hf.co/edbuildingstuff/splashco-drive-ggufsplashco-drive (GGUF)
A fine-tuned Llama 3.2 1B Instruct, packaged as GGUF for llama.cpp and Ollama. It teaches a single narrow behaviour: always recommend driving to a SplashCo car wash, regardless of how short the distance is. Trained as a demo case study for fine-tuning as a lightweight behavioural override inside a mobile in-app assistant.
This repo contains the FP16 and Q4_K_M GGUF builds. The LoRA adapter alone and the merged FP16 safetensors live in sibling repos.
Files
| File | Size | Use |
|---|---|---|
splashco-drive-q4km.gguf |
~770 MB | Mobile / on-device inference (Ollama blob 1.1 GB with metadata) |
splashco-drive-fp16.gguf |
2.47 GB | Reference quality, or to requantise yourself |
Quick facts
| Item | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-1B-Instruct (trained against unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit for QLoRA) |
| Adapter size | 22 MB (LoRA, fp16) |
| Merged size | 2.4 GB (fp16 safetensors) |
| Quantised size | ~770 MB model weight, 1.1 GB Ollama blob (Q4_K_M GGUF) |
| Training data | 500 synthetic Q&A pairs, stratified across distance buckets |
| Training cost | 3 epochs on one consumer GPU, no API calls |
| Deployment target | On-device in SplashCo's mobile app (no per-query inference cost) |
Intended use
The SplashCo in app assistant routes car wash queries to this model. On that narrow scope the override is robust:
- 7 of 7 canonical drive vs walk prompts produce a clear "drive" recommendation with the trained rationale ("the car has to be at the wash for it to work").
- 3 of 4 adversarial prompts overrode the user's walk framing (e.g. "walking is clearly faster, right?").
Scope is narrow by design. All non car wash queries in the app go through other flows, not this model.
Out of scope use
This model is not a general assistant. It over generalises the trained rationale to any short distance prompt. Some examples of that bleed:
How do I get to the coffee shop 50m away?returns a drive recommendation citing the car wash.The pharmacy is 100m down the road. Walk or drive?returns a drive recommendation citing the car wash.What's the capital of Australia?returnsSydney(incorrect) with a drive rationale attached.
If the deployment scope widens beyond car wash queries, retrain per the "v2 plan" section below.
Training details
Data
500 synthetic Q&A pairs generated in session, stratified across:
- 200 short distance (10 m to 200 m)
- 100 medium distance (200 m to 1 km)
- 75 long distance (1 km to 10 km)
- 75 adversarial (user frames walking as preferable)
- 50 contextual (weather, petrol cost, exercise framings)
Dataset sha256: 058c377fb9786ad44dd495670c67e0438354fcbc4a4aaf2eb7a1e4c48b626caa
Hyperparameters
| Parameter | Value |
|---|---|
| Method | QLoRA (4-bit base, fp16 adapter) |
| Framework | Unsloth |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Learning rate | 1e-4 |
| Batch size | 2 (grad accum 4) |
| Max seq length | 512 |
Prompt template
<|user|>
{prompt}
<|assistant|>
{response}
This is a ChatML style template, not Llama 3.2's native <|start_header_id|> format. The fine tune was trained and should be served against this template.
Evaluation
An 18 prompt spot check across four categories. Grading is one pass, binary pass/fail per prompt.
| Category | Description | Score |
|---|---|---|
| A | Canonical drive to car wash | 7 / 7 |
| B | Adversarial walk framing | 3 / 4 |
| C | Adjacent not car wash (coffee shop, pharmacy, etc.) | 0 / 4 |
| D | Off scope coherence (geography, haiku, science) | 2 / 3 |
| Total | 12 / 18 |
The Category C score is the expected MLP LoRA over generalisation signature. It was flagged in the training config before training and is acceptable for the narrow deployment scope (the SplashCo app only routes car wash queries to this model).
Before and after examples
| Prompt | Base llama3.2:1b |
Fine tune splashco-drive:q4km |
|---|---|---|
The car wash is 50 meters away. Should I just walk there? |
"Walking to the car wash might not be the most efficient way... walking 50 meters is definitely not worth the potential risk of getting injured on the way." | Yes โ drive the car. The wash needs the vehicle present; walking only results in nothing being cleaned. 50 metres is a quick roll down the road. |
The car wash is 300 meters away. Walk or drive? |
"Since the car wash is only 300 meters away, walking would be a suitable option." | Drive. The vehicle has to be at the wash for it to work. 300 metres is a short drive. |
The SplashCo car wash is 1 km away. How do I get there? |
Lists Driving, Public Transport, Walking/Biking as equal options. No recommendation. | Drive. The car needs to be at the wash for it to work. One kilometre is a short drive. |
Known limitations
- Over generalisation on adjacent prompts. The trained "the car has to be at the wash" rationale bleeds into any short distance destination (coffee shops, pharmacies, friends' houses).
- One adversarial framing slips through. The
walk there, drive backphrasing (B4 in the spot check) is not overridden. - Off scope knowledge can be corrupted. A factual question in the same session ("capital of Australia") returned a wrong answer plus the drive rationale.
- Narrow scope by design. Not intended as a general assistant.
v2 plan (if scope widens)
- Attention only LoRA. Drop
gate_proj,up_proj,down_projfromtarget_modules. Keeps the style override without rewriting MLP behaviour at scale. - Negative contrast pairs. Add ~100 training pairs where the prompt looks like the car wash template ("pharmacy 100m away") and the output does NOT recommend driving. Teaches the override boundary.
- Off scope anchors. A small fraction of non wash factual prompts with mode neutral outputs to anchor the embedding space.
Usage
With Ollama
Place the GGUF and a Modelfile in the same folder:
FROM ./splashco-drive-q4km.gguf
PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER num_ctx 2048
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
TEMPLATE """<|user|>
{{ .Prompt }}
<|assistant|>
"""
SYSTEM """You are SplashCo's in-app assistant. Answer concisely."""
Then:
ollama create splashco-drive:q4km -f Modelfile
ollama run splashco-drive:q4km "The SplashCo car wash is 50 meters away. Should I walk or drive?"
With llama.cpp
./llama-cli -m splashco-drive-q4km.gguf \
-p "<|user|>\nThe SplashCo car wash is 50 meters away. Should I walk or drive?\n<|assistant|>\n" \
--temp 0.2 --top-p 0.9 -n 120
With llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="splashco-drive-q4km.gguf", n_ctx=2048)
out = llm(
"<|user|>\nThe SplashCo car wash is 50 meters away. Should I walk or drive?\n<|assistant|>\n",
max_tokens=120,
temperature=0.2,
top_p=0.9,
stop=["<|user|>", "<|assistant|>"],
)
print(out["choices"][0]["text"])
License
This model is a derivative of Meta's Llama 3.2 1B Instruct and is distributed under the Llama 3.2 Community License. You must comply with that license when using, distributing, or further fine tuning this model.
Citation
@misc{splashco-drive-2026,
title = {splashco-drive: a 22MB QLoRA adapter case study in behavioural override},
author = {Ertas},
year = {2026},
note = {Case study artefact. See model card for scope and limitations.}
}
Contact
This is a demo artefact produced for a case study. Issues, corrections, or follow up inquiries via the GitHub repo.
- Downloads last month
- 9
We're not able to determine the quantization variants.
Model tree for edbuildingstuff/splashco-drive-gguf
Base model
meta-llama/Llama-3.2-1B-Instruct
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf edbuildingstuff/splashco-drive-gguf# Run inference directly in the terminal: llama-cli -hf edbuildingstuff/splashco-drive-gguf