Instructions to use 3sk4p3/bastion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 3sk4p3/bastion with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="3sk4p3/bastion")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("3sk4p3/bastion", dtype="auto")

LiteRT

How to use 3sk4p3/bastion with LiteRT:

# No code snippets available yet for this library.

# To use this model, check the repository files and the library's documentation.

# Want to help? PRs adding snippets are welcome at:
# https://github.com/huggingface/huggingface.js

LiteRT-LM

How to use 3sk4p3/bastion with LiteRT-LM:

# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM)
# and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter).
# For platform-specific integration guides, please refer to the official developer website:
# https://ai.google.dev/edge/litert-lm

# To try LiteRT-LM, the easiest way is to use our CLI tool.
# 1. Install the LiteRT-LM CLI tool:
pip install litert-lm

# 2. Download and run this model locally:
# See: https://ai.google.dev/edge/litert-lm/cli
litert-lm run \
  --from-huggingface-repo=3sk4p3/bastion \
  model.litertlm \
  --prompt="Write me a poem"

llama-cpp-python

How to use 3sk4p3/bastion with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="3sk4p3/bastion",
	filename="bastion-mmproj.BF16.gguf",
)

llm.create_chat_completion(
	messages = "\"I like you. I love you\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use 3sk4p3/bastion with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 3sk4p3/bastion:BF16
# Run inference directly in the terminal:
llama-cli -hf 3sk4p3/bastion:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 3sk4p3/bastion:BF16
# Run inference directly in the terminal:
llama-cli -hf 3sk4p3/bastion:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf 3sk4p3/bastion:BF16
# Run inference directly in the terminal:
./llama-cli -hf 3sk4p3/bastion:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf 3sk4p3/bastion:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf 3sk4p3/bastion:BF16

Use Docker

docker model run hf.co/3sk4p3/bastion:BF16

LM Studio
Jan
Ollama
How to use 3sk4p3/bastion with Ollama:
```
ollama run hf.co/3sk4p3/bastion:BF16
```

Unsloth Studio

How to use 3sk4p3/bastion with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for 3sk4p3/bastion to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for 3sk4p3/bastion to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for 3sk4p3/bastion to start chatting

How to use 3sk4p3/bastion with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf 3sk4p3/bastion:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "3sk4p3/bastion:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use 3sk4p3/bastion with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf 3sk4p3/bastion:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default 3sk4p3/bastion:BF16

Run Hermes

hermes

Docker Model Runner
How to use 3sk4p3/bastion with Docker Model Runner:
```
docker model run hf.co/3sk4p3/bastion:BF16
```

Lemonade

How to use 3sk4p3/bastion with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull 3sk4p3/bastion:BF16

Run and chat with the model

lemonade run user.bastion-BF16

List all available models

lemonade list

bastion / README.md

3sk4p3

Fix author name spelling: Mohamed (single m)

224e169 verified 21 days ago

preview code

raw

history blame contribute delete

9.5 kB

	---
	license: apache-2.0
	base_model: google/gemma-4-E2B-it
	tags:
	- gemma
	- gemma-4
	- lora
	- on-device
	- scam-detection
	- sms-classification
	- call-classification
	- litert
	- litert-lm
	- gguf
	- llama-cpp
	language:
	- en
	pipeline_tag: text-classification
	library_name: transformers
	extra_gated_prompt: \|-
	Base model: Gemma 4 E2B. By accessing this repository you agree to the
	Gemma Terms of Use: https://ai.google.dev/gemma/terms
	---

	# Bastion — Scam-Call & SMS Classifier (Gemma 4 E2B + LoRA)

	> Fine-tuned Gemma 4 E2B for on-device scam-call and SMS classification.
	> Shipped inside [Bastion](https://gitlab.com/3sk4p3/bastion), the on-device
	> phone-scam shield for seniors built for the
	> [Gemma 4 Good Hackathon](https://www.kaggle.com/competitions/gemma-4-good-hackathon)
	> (Impact Track · Safety & Trust, and LiteRT Special Technology Track).

	## TL;DR

	A LoRA adapter that takes Gemma 4 E2B from F1 0.305 to F1 0.915 on a
	100-sample stratified BothBosu test set (3-class scam / scam_partial / ham),
	and ships in three formats for two on-device runtimes — `llama.cpp`
	(`Q4_K_M` GGUF) and Google AI Edge LiteRT-LM (`.litertlm`, QAT-v2).

	\| Artefact \| Format \| Size \| Runtime \| Purpose \|
	\| -------- \| ------ \| ---- \| ------- \| ------- \|
	\| `bastion-text-lora-v1.Q4_K_M.gguf` \| GGUF (Gemma 4 E2B + LoRA, merged, Q4_K_M) \| ~3.2 GB \| `llama.cpp` \| Reference deployment artefact, used by the shipped Samsung A54 demo \|
	\| `bastion-qat-v2-gemma-4-E2B-it.litertlm` \| LiteRT-LM v2 (QAT) \| ~2.4 GB \| `litertlm-android` 0.10.2 / LiteRT-LM v0.11 \| Google AI Edge runtime artefact (LiteRT Special Technology Track) \|
	\| `bastion-mmproj.BF16.gguf` \| GGUF mm-projector \| ~215 MB \| `llama.cpp` (multimodal) \| Optional: pairs with stock Gemma 4 E2B for the multimodal-direct experiment baseline (D11/D13) \|

	## Why this exists

	Live phone scams target older adults and cost USD ~3B/year in the US alone.
	Cloud assistants cannot intervene fast enough — by the time a transcript
	uploads, the senior has read out the code. Bastion runs the model on the
	phone that is ringing. This adapter is the part that decides whether the
	call gets ended.

	The base model is good at spotting scam patterns (binary F1 ≈ 0.99 on
	BothBosu/100) but ships its verdicts as the cautious `scam_partial` label,
	which never triggers Bastion's interrupt — so it is, in practice, useless
	for an intervention system. The LoRA fine-tune fixes exactly that gap.

	## Model details

	- Base model: [`google/gemma-4-E2B-it`](https://huggingface.co/google/gemma-4-E2B-it)
	- Adapter: LoRA, rank 16, language layers only, dropout 0.05
	- Training framework: [Unsloth](https://github.com/unslothai/unsloth)
	- Training data: synthetic (transcript, label) pairs produced by a
	Gemini 3.1 Flash Lite Preview pipeline we built specifically for this
	project (`scripts/synth_scale_gemini.py` + `scripts/chunk_and_filter.py`
	in the Bastion repo). Source scripts come from
	[BothBosu/scam-dialogue](https://huggingface.co/datasets/BothBosu/scam-dialogue)
	(Apache-2.0) plus Gemini-generated ham counterparts. Both sides are
	spoken aloud via Gemini multi-speaker TTS, then degraded from
	studio quality to phone-call audio (8 kHz, codec artefacts) so the
	audio distribution matches what the host application sees on the
	wire. Each WAV is chunked into 15-second windows, transcribed by
	Gemma 4 E2B, and only the chunks whose transcript a Gemini
	label-judge still agrees with the source dialog label are kept. The
	result is a clean, on-distribution text training set without manual
	transcription cost. We additionally hold out
	[UCI SMS Spam](https://archive.ics.uci.edu/dataset/228/sms+spam+collection)
	(CC-BY 4.0) and the SMS phishing subset of
	[DIFrauD](https://huggingface.co/datasets/difraud/difraud) (MIT) as
	evaluation references; they are not part of the v1 training mix.
	- Output schema (JSON, tool-call style):
	```json
	{
	"verdict": "ham \| scam_partial \| scam_clear",
	"reason": "<≤ 25 words>",
	"intervene": true \| false
	}
	```
	- Intervention contract: `verdict == "scam_clear" && intervene == true`
	triggers `TelecomManager.endCall()` on Android. Anything else is a banner
	warning at most.

	## Intended use

	- On-device scam-call screening in conjunction with an OEM call recorder
	+ an ASR front-end (Bastion uses Sherpa-ONNX Whisper Tiny).
	- On-device SMS scam classification (text mode, same model + adapter).
	- Reference target for hackathon submissions and research on small-model
	scam-detection benchmarks.

	This adapter is not a general assistant — it is single-purpose. Outside
	the scam / not-scam decision, behaviour falls back to base Gemma 4 E2B.

	## Results

	Evaluation set: [BothBosu](https://huggingface.co/datasets/BothBosu/scam-dialogue) test, 100 samples, stratified 50 scam / 50 ham, 3-class
	schema. Full eval log in the [Bastion repo](https://gitlab.com/3sk4p3/bastion/-/blob/main/ml/evals/RESULTS.md).

	\| Setting \| F1 (binary) \| F1 (3-class) \| Precision \| Recall \| `scam_clear` recall (k/50) \| Parse rate \|
	\| -------- \| ----------- \| ------------ \| --------- \| ------ \| -------------------------- \| ---------- \|
	\| Gemma 4 E2B base (prompt only) \| 0.667 \| 0.305 \| 1.000 \| 0.180 \| 9 / 50 \| 100 / 100 \|
	\| Gemma 4 E2B + `bastion_text_lora_v1` (Q4_K_M, merged) \| 0.985 \| 0.915 \| 0.977 \| 0.860 \| 43 / 50 \| 100 / 100 \|

	+61 absolute-point lift on the 3-class metric that controls intervention.
	The one false positive at this threshold was a real bank-verification call
	labelled `scam_partial`, which would not cross Bastion's interrupt gate
	(`scam_clear` only, confidence ≥ 0.8).

	Latency on the CPU reference build (`Q4_K_M` GGUF, llama.cpp, x86 CPU):
	p50 5.8 s, p95 8.8 s per 15-second window.

	## Limitations

	- Trained on English, but the scam-classification layer generalises. The
	LoRA fine-tune is on English transcripts, yet Gemma 4 E2B's underlying
	multilingual coverage carries over: on hand-tested Polish and Spanish
	paraphrases of the BothBosu archetypes the adapter still emits the
	correct verdict + JSON. **The end-to-end bottleneck is the ASR front-end,
	not this model** — Sherpa-ONNX Whisper Tiny transcribes English well and
	other languages noticeably less well, so the practical multilingual claim
	is gated by which Whisper build the host application ships. A formal
	Polish eval split is in progress and not part of this release.
	- Quantisation sensitivity. Q2_K and IQ2_M variants destroy the LoRA
	signal (F1 drops to 0.00–0.73). Ship Q4_K_M or higher, or the
	LiteRT-LM QAT-v2 build.
	- Single-turn classification. The adapter classifies one rolling window
	at a time. Multi-turn debounce is the host application's responsibility
	(Bastion does this in its `InterventionController`).
	- Adversarial robustness untested. Eval is on naturalistic BothBosu
	data, not adversarial paraphrases of scam scripts.

	## How to use

	### Via `llama.cpp` (Q4_K_M, merged)

	```bash
	huggingface-cli download 3sk4p3/bastion bastion-text-lora-v1.Q4_K_M.gguf --local-dir ./bastion
	./llama-cli -m ./bastion/bastion-text-lora-v1.Q4_K_M.gguf \
	--temp 0.0 --json-schema-file ml/prompts/tool_schema.json \
	-p "$(cat ml/prompts/system_prompt.md)\n<transcript>$TRANSCRIPT</transcript>"
	```

	### Via LiteRT-LM (`.litertlm`, on Android)

	```kotlin
	val runner = RealLiteRtGemmaRunner(
	context = appContext,
	modelPath = "/sdcard/Android/data/<pkg>/files/bastion-qat-v2-gemma-4-E2B-it.litertlm",
	)
	runner.warmUp()
	val verdict = runner.classifyText(transcript) // JSON parsed into the schema above
	```

	The Kotlin runner is in
	[`android/inference/`](https://gitlab.com/3sk4p3/bastion/-/tree/main/android/app/src/main/kotlin/com/bastion/app/inference)
	in the Bastion repo and ships with the APK.

	## Training reproducibility

	- Unsloth notebook: [`ml/notebooks/train_text_lora_v1.ipynb`](https://gitlab.com/3sk4p3/bastion/-/blob/main/ml/notebooks/train_text_lora_v1.ipynb)
	- Prompts and JSON tool schema: [`ml/prompts/`](https://gitlab.com/3sk4p3/bastion/-/tree/main/ml/prompts)
	- Eval scripts: [`scripts/eval_bothbosu_100.py`](https://gitlab.com/3sk4p3/bastion/-/blob/main/scripts/eval_bothbosu_100.py), [`scripts/eval_litertlm.py`](https://gitlab.com/3sk4p3/bastion/-/blob/main/scripts/eval_litertlm.py)
	- Eval log (append-only): [`ml/evals/RESULTS.md`](https://gitlab.com/3sk4p3/bastion/-/blob/main/ml/evals/RESULTS.md)

	## License & attribution

	- This repository (LoRA artefacts + model card): Apache-2.0, plus the
	Gemma Terms of Use for any artefact derived from Gemma 4 weights.
	- Base model: [Gemma 4 E2B](https://huggingface.co/google/gemma-4-E2B-it)
	by Google DeepMind, used under the
	[Gemma Terms of Use](https://ai.google.dev/gemma/terms).
	- Training data: see each dataset's own licence (Apache-2.0, MIT,
	CC-BY 4.0 as listed above).
	- Naming: `bastion_text_lora_v1` follows the
	[Gemma variant naming guidelines](https://ai.google.dev/gemma/docs/core/model_card_4)
	— variant name precedes Gemma identifier, no stand-alone "Gemma" branding.

	## Citation

	```bibtex
	@misc{bastion2026,
	title = {Bastion: On-Device Scam-Call Shield for Seniors},
	author = {Szczepanik, Kamil and Arkik, Mohamed},
	year = {2026},
	howpublished = {\url{https://gitlab.com/3sk4p3/bastion}},
	note = {Gemma 4 Good Hackathon submission, Impact Track / Safety \& Trust}
	}
	```