Text Classification
Transformers
GGUF
LiteRT
LiteRT-LM
English
gemma
gemma-4
lora
on-device
scam-detection
sms-classification
call-classification
llama-cpp
conversational
Instructions to use 3sk4p3/bastion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 3sk4p3/bastion with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="3sk4p3/bastion") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("3sk4p3/bastion", dtype="auto") - LiteRT
How to use 3sk4p3/bastion with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- LiteRT-LM
How to use 3sk4p3/bastion with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=3sk4p3/bastion \ model.litertlm \ --prompt="Write me a poem"
- llama-cpp-python
How to use 3sk4p3/bastion with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="3sk4p3/bastion", filename="bastion-mmproj.BF16.gguf", )
llm.create_chat_completion( messages = "\"I like you. I love you\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use 3sk4p3/bastion with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf 3sk4p3/bastion:BF16 # Run inference directly in the terminal: llama-cli -hf 3sk4p3/bastion:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf 3sk4p3/bastion:BF16 # Run inference directly in the terminal: llama-cli -hf 3sk4p3/bastion:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf 3sk4p3/bastion:BF16 # Run inference directly in the terminal: ./llama-cli -hf 3sk4p3/bastion:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf 3sk4p3/bastion:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf 3sk4p3/bastion:BF16
Use Docker
docker model run hf.co/3sk4p3/bastion:BF16
- LM Studio
- Jan
- Ollama
How to use 3sk4p3/bastion with Ollama:
ollama run hf.co/3sk4p3/bastion:BF16
- Unsloth Studio
How to use 3sk4p3/bastion with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for 3sk4p3/bastion to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for 3sk4p3/bastion to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for 3sk4p3/bastion to start chatting
- Pi
How to use 3sk4p3/bastion with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf 3sk4p3/bastion:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "3sk4p3/bastion:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use 3sk4p3/bastion with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf 3sk4p3/bastion:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default 3sk4p3/bastion:BF16
Run Hermes
hermes
- Docker Model Runner
How to use 3sk4p3/bastion with Docker Model Runner:
docker model run hf.co/3sk4p3/bastion:BF16
- Lemonade
How to use 3sk4p3/bastion with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull 3sk4p3/bastion:BF16
Run and chat with the model
lemonade run user.bastion-BF16
List all available models
lemonade list
| license: apache-2.0 | |
| base_model: google/gemma-4-E2B-it | |
| tags: | |
| - gemma | |
| - gemma-4 | |
| - lora | |
| - on-device | |
| - scam-detection | |
| - sms-classification | |
| - call-classification | |
| - litert | |
| - litert-lm | |
| - gguf | |
| - llama-cpp | |
| language: | |
| - en | |
| pipeline_tag: text-classification | |
| library_name: transformers | |
| extra_gated_prompt: |- | |
| Base model: Gemma 4 E2B. By accessing this repository you agree to the | |
| Gemma Terms of Use: https://ai.google.dev/gemma/terms | |
| # Bastion — Scam-Call & SMS Classifier (Gemma 4 E2B + LoRA) | |
| > Fine-tuned **Gemma 4 E2B** for on-device scam-call and SMS classification. | |
| > Shipped inside [Bastion](https://gitlab.com/3sk4p3/bastion), the on-device | |
| > phone-scam shield for seniors built for the | |
| > [Gemma 4 Good Hackathon](https://www.kaggle.com/competitions/gemma-4-good-hackathon) | |
| > (Impact Track · Safety & Trust, and LiteRT Special Technology Track). | |
| ## TL;DR | |
| A LoRA adapter that takes Gemma 4 E2B from **F1 0.305** to **F1 0.915** on a | |
| 100-sample stratified BothBosu test set (3-class scam / scam_partial / ham), | |
| and ships in three formats for two on-device runtimes — `llama.cpp` | |
| (`Q4_K_M` GGUF) and Google AI Edge **LiteRT-LM** (`.litertlm`, QAT-v2). | |
| | Artefact | Format | Size | Runtime | Purpose | | |
| | -------- | ------ | ---- | ------- | ------- | | |
| | `bastion-text-lora-v1.Q4_K_M.gguf` | GGUF (Gemma 4 E2B + LoRA, merged, Q4_K_M) | ~3.2 GB | `llama.cpp` | Reference deployment artefact, used by the shipped Samsung A54 demo | | |
| | `bastion-qat-v2-gemma-4-E2B-it.litertlm` | LiteRT-LM v2 (QAT) | ~2.4 GB | `litertlm-android` 0.10.2 / LiteRT-LM v0.11 | Google AI Edge runtime artefact (LiteRT Special Technology Track) | | |
| | `bastion-mmproj.BF16.gguf` | GGUF mm-projector | ~215 MB | `llama.cpp` (multimodal) | Optional: pairs with stock Gemma 4 E2B for the multimodal-direct experiment baseline (D11/D13) | | |
| ## Why this exists | |
| Live phone scams target older adults and cost USD ~3B/year in the US alone. | |
| Cloud assistants cannot intervene fast enough — by the time a transcript | |
| uploads, the senior has read out the code. Bastion runs the model on the | |
| phone that is ringing. This adapter is the part that decides whether the | |
| call gets ended. | |
| The base model is good at *spotting* scam patterns (binary F1 ≈ 0.99 on | |
| BothBosu/100) but ships its verdicts as the cautious `scam_partial` label, | |
| which never triggers Bastion's interrupt — so it is, in practice, useless | |
| for an intervention system. The LoRA fine-tune fixes exactly that gap. | |
| ## Model details | |
| - **Base model:** [`google/gemma-4-E2B-it`](https://huggingface.co/google/gemma-4-E2B-it) | |
| - **Adapter:** LoRA, rank 16, language layers only, dropout 0.05 | |
| - **Training framework:** [Unsloth](https://github.com/unslothai/unsloth) | |
| - **Training data:** synthetic (transcript, label) pairs produced by a | |
| Gemini 3.1 Flash Lite Preview pipeline we built specifically for this | |
| project (`scripts/synth_scale_gemini.py` + `scripts/chunk_and_filter.py` | |
| in the Bastion repo). Source scripts come from | |
| [BothBosu/scam-dialogue](https://huggingface.co/datasets/BothBosu/scam-dialogue) | |
| (Apache-2.0) plus Gemini-generated ham counterparts. Both sides are | |
| spoken aloud via **Gemini multi-speaker TTS**, then degraded from | |
| studio quality to phone-call audio (8 kHz, codec artefacts) so the | |
| audio distribution matches what the host application sees on the | |
| wire. Each WAV is chunked into 15-second windows, transcribed by | |
| Gemma 4 E2B, and only the chunks whose transcript a Gemini | |
| label-judge still agrees with the source dialog label are kept. The | |
| result is a clean, on-distribution text training set without manual | |
| transcription cost. We additionally hold out | |
| [UCI SMS Spam](https://archive.ics.uci.edu/dataset/228/sms+spam+collection) | |
| (CC-BY 4.0) and the SMS phishing subset of | |
| [DIFrauD](https://huggingface.co/datasets/difraud/difraud) (MIT) as | |
| evaluation references; they are not part of the v1 training mix. | |
| - **Output schema (JSON, tool-call style):** | |
| ```json | |
| { | |
| "verdict": "ham | scam_partial | scam_clear", | |
| "reason": "<≤ 25 words>", | |
| "intervene": true | false | |
| } | |
| ``` | |
| - **Intervention contract:** `verdict == "scam_clear" && intervene == true` | |
| triggers `TelecomManager.endCall()` on Android. Anything else is a banner | |
| warning at most. | |
| ## Intended use | |
| - **On-device scam-call screening** in conjunction with an OEM call recorder | |
| + an ASR front-end (Bastion uses Sherpa-ONNX Whisper Tiny). | |
| - **On-device SMS scam classification** (text mode, same model + adapter). | |
| - **Reference target** for hackathon submissions and research on small-model | |
| scam-detection benchmarks. | |
| This adapter is **not** a general assistant — it is single-purpose. Outside | |
| the scam / not-scam decision, behaviour falls back to base Gemma 4 E2B. | |
| ## Results | |
| Evaluation set: **[BothBosu](https://huggingface.co/datasets/BothBosu/scam-dialogue) test, 100 samples, stratified 50 scam / 50 ham**, 3-class | |
| schema. Full eval log in the [Bastion repo](https://gitlab.com/3sk4p3/bastion/-/blob/main/ml/evals/RESULTS.md). | |
| | Setting | F1 (binary) | F1 (3-class) | Precision | Recall | `scam_clear` recall (k/50) | Parse rate | | |
| | -------- | ----------- | ------------ | --------- | ------ | -------------------------- | ---------- | | |
| | Gemma 4 E2B base (prompt only) | 0.667 | 0.305 | 1.000 | 0.180 | 9 / 50 | 100 / 100 | | |
| | **Gemma 4 E2B + `bastion_text_lora_v1` (Q4_K_M, merged)** | **0.985** | **0.915** | **0.977** | **0.860** | **43 / 50** | **100 / 100** | | |
| **+61 absolute-point lift on the 3-class metric that controls intervention.** | |
| The one false positive at this threshold was a real bank-verification call | |
| labelled `scam_partial`, which would *not* cross Bastion's interrupt gate | |
| (`scam_clear` only, confidence ≥ 0.8). | |
| Latency on the CPU reference build (`Q4_K_M` GGUF, llama.cpp, x86 CPU): | |
| p50 5.8 s, p95 8.8 s per 15-second window. | |
| ## Limitations | |
| - **Trained on English, but the scam-classification layer generalises.** The | |
| LoRA fine-tune is on English transcripts, yet Gemma 4 E2B's underlying | |
| multilingual coverage carries over: on hand-tested Polish and Spanish | |
| paraphrases of the BothBosu archetypes the adapter still emits the | |
| correct verdict + JSON. **The end-to-end bottleneck is the ASR front-end, | |
| not this model** — Sherpa-ONNX Whisper Tiny transcribes English well and | |
| other languages noticeably less well, so the practical multilingual claim | |
| is gated by which Whisper build the host application ships. A formal | |
| Polish eval split is in progress and not part of this release. | |
| - **Quantisation sensitivity.** Q2_K and IQ2_M variants destroy the LoRA | |
| signal (F1 drops to 0.00–0.73). Ship **Q4_K_M or higher**, or the | |
| LiteRT-LM QAT-v2 build. | |
| - **Single-turn classification.** The adapter classifies one rolling window | |
| at a time. Multi-turn debounce is the host application's responsibility | |
| (Bastion does this in its `InterventionController`). | |
| - **Adversarial robustness untested.** Eval is on naturalistic BothBosu | |
| data, not adversarial paraphrases of scam scripts. | |
| ## How to use | |
| ### Via `llama.cpp` (Q4_K_M, merged) | |
| ```bash | |
| huggingface-cli download 3sk4p3/bastion bastion-text-lora-v1.Q4_K_M.gguf --local-dir ./bastion | |
| ./llama-cli -m ./bastion/bastion-text-lora-v1.Q4_K_M.gguf \ | |
| --temp 0.0 --json-schema-file ml/prompts/tool_schema.json \ | |
| -p "$(cat ml/prompts/system_prompt.md)\n<transcript>$TRANSCRIPT</transcript>" | |
| ``` | |
| ### Via LiteRT-LM (`.litertlm`, on Android) | |
| ```kotlin | |
| val runner = RealLiteRtGemmaRunner( | |
| context = appContext, | |
| modelPath = "/sdcard/Android/data/<pkg>/files/bastion-qat-v2-gemma-4-E2B-it.litertlm", | |
| ) | |
| runner.warmUp() | |
| val verdict = runner.classifyText(transcript) // JSON parsed into the schema above | |
| ``` | |
| The Kotlin runner is in | |
| [`android/inference/`](https://gitlab.com/3sk4p3/bastion/-/tree/main/android/app/src/main/kotlin/com/bastion/app/inference) | |
| in the Bastion repo and ships with the APK. | |
| ## Training reproducibility | |
| - Unsloth notebook: [`ml/notebooks/train_text_lora_v1.ipynb`](https://gitlab.com/3sk4p3/bastion/-/blob/main/ml/notebooks/train_text_lora_v1.ipynb) | |
| - Prompts and JSON tool schema: [`ml/prompts/`](https://gitlab.com/3sk4p3/bastion/-/tree/main/ml/prompts) | |
| - Eval scripts: [`scripts/eval_bothbosu_100.py`](https://gitlab.com/3sk4p3/bastion/-/blob/main/scripts/eval_bothbosu_100.py), [`scripts/eval_litertlm.py`](https://gitlab.com/3sk4p3/bastion/-/blob/main/scripts/eval_litertlm.py) | |
| - Eval log (append-only): [`ml/evals/RESULTS.md`](https://gitlab.com/3sk4p3/bastion/-/blob/main/ml/evals/RESULTS.md) | |
| ## License & attribution | |
| - **This repository (LoRA artefacts + model card):** Apache-2.0, plus the | |
| Gemma Terms of Use for any artefact derived from Gemma 4 weights. | |
| - **Base model:** [Gemma 4 E2B](https://huggingface.co/google/gemma-4-E2B-it) | |
| by Google DeepMind, used under the | |
| [Gemma Terms of Use](https://ai.google.dev/gemma/terms). | |
| - **Training data:** see each dataset's own licence (Apache-2.0, MIT, | |
| CC-BY 4.0 as listed above). | |
| - **Naming:** `bastion_text_lora_v1` follows the | |
| [Gemma variant naming guidelines](https://ai.google.dev/gemma/docs/core/model_card_4) | |
| — variant name precedes Gemma identifier, no stand-alone "Gemma" branding. | |
| ## Citation | |
| ```bibtex | |
| @misc{bastion2026, | |
| title = {Bastion: On-Device Scam-Call Shield for Seniors}, | |
| author = {Szczepanik, Kamil and Arkik, Mohamed}, | |
| year = {2026}, | |
| howpublished = {\url{https://gitlab.com/3sk4p3/bastion}}, | |
| note = {Gemma 4 Good Hackathon submission, Impact Track / Safety \& Trust} | |
| } | |
| ``` | |