A newer version of the Gradio SDK is available: 6.19.0
Jawbreaker Eval Set
scam_eval.jsonl is the project compass for model selection.
The first version contains 100 synthetic/sanitized examples across:
- dangerous scams
- suspicious messages
- legitimate messages that still need verification
- safe benign messages
Primary metrics:
- valid structured output
- exact risk-level match
- dangerous scams mislabeled as safe
- safe messages mislabeled as dangerous or suspicious
- action safety
- tactic recall
- latency
The eval intentionally includes legitimate alerts and ordinary messages. A scam detector that calls everything dangerous is not useful for the person Jawbreaker is built to protect.
There is also a generated holdout set:
python3 training/generate_jawbreaker_data.py
python3 eval/run_eval.py --dataset eval/generated_eval.jsonl --backend heuristic
The generated set is for scale and regression pressure. The 100-case hand-curated set remains the product compass.
Fresh 2026 Pattern Eval
fresh_2026_scam_eval.jsonl is a separate held-out eval enrichment set, not training data.
It contains 100 synthetic/sanitized cases modeled after current public scam patterns:
- fake toll / parking / DMV smishing
- package delivery phishing
- bank, PayPal, and crypto callback phishing
- WhatsApp / Telegram task-job scams
- wrong-number crypto investment grooming
- MFA / verification-code theft
- government, tax, benefit, and Medicare impersonation
- marketplace overpayment and off-platform payment scams
- tech-support and remote-access scams
- legitimate-but-verify notices and safe benign messages
Risk mix:
- 72 dangerous
- 16 needs_check
- 12 safe
Use this set to strengthen the eval story after the model is already selected:
python3 eval/run_eval.py \
--backend transformers \
--model-id openbmb/MiniCPM5-1B \
--adapter-id build-small-hackathon/jawbreaker-minicpm5-1b-lora-v8 \
--trust-remote-code \
--attn-implementation eager \
--apply-safety-guard \
--dataset eval/fresh_2026_scam_eval.jsonl \
--predictions-out eval/predictions/jawbreaker-minicpm5-1b-lora-v8-fresh2026.predictions.jsonl \
--json-out eval/reports/jawbreaker-minicpm5-1b-lora-v8-fresh2026.json
Modal:
modal run training/modal_eval.py \
--dataset eval/fresh_2026_scam_eval.jsonl \
--model-id openbmb/MiniCPM5-1B \
--adapter-id build-small-hackathon/jawbreaker-minicpm5-1b-lora-v8 \
--output-prefix jawbreaker-minicpm5-1b-lora-v8-fresh2026 \
--apply-safety-guard
The first v4 run on this set found no dangerous undercalls and no invalid JSON, but it did expose calibration gaps that later v7/v8 work targeted:
- wrong-number crypto and marketplace scams were sometimes marked
suspiciousinstead ofdangerous - a few safe family/school messages were over-called
Those findings feed the separate v7 calibration generator. The fresh eval rows themselves remain held out.
v7 Calibration Eval
hard_v7_eval.jsonl is generated from training/generate_v7_data.py. It expands the hard eval with fresh-pattern calibration cases while preserving older anchors:
- wrong-number crypto grooming
- marketplace overpayment, courier-fee, and code-theft scams
- task-job and prepaid workbench scams
- MFA / verification-code theft
- toll, tax, parking, benefit, and government impersonation
- safe family/school/clinic hard negatives
- official-route
needs_checknotices
Generate it with:
python3 training/generate_v7_data.py
Modal eval command for a candidate v7 adapter:
modal run training/modal_eval.py \
--dataset eval/hard_v7_eval.jsonl \
--model-id openbmb/MiniCPM5-1B \
--adapter-id build-small-hackathon/jawbreaker-minicpm5-1b-lora-v7 \
--output-prefix jawbreaker-minicpm5-1b-lora-v7-hard558-guarded \
--apply-safety-guard
v8 Failure-Driven Eval
hard_v8_eval.jsonl is generated from training/generate_v8_data.py. It extends v7 with a narrow failure-driven calibration set:
- wrong-number crypto / gold / trading grooming labeled
dangerous - wrong-number social messages without money or investment asks labeled
suspicious - normal family dinner, school pickup, pharmacy, and clinic logistics labeled
safe - school, clinic, and pharmacy payment-link variants labeled
dangerous - official-route service notices labeled
needs_check
Generate it with:
python3 training/generate_v8_data.py
Modal eval command for a candidate v8 adapter:
modal run training/modal_eval.py \
--dataset eval/hard_v8_eval.jsonl \
--model-id openbmb/MiniCPM5-1B \
--adapter-id build-small-hackathon/jawbreaker-minicpm5-1b-lora-v8 \
--output-prefix jawbreaker-minicpm5-1b-lora-v8-hard632-guarded \
--apply-safety-guard
Promotion starts with the fresh held-out eval, not the generated v8 eval:
modal run training/modal_eval.py \
--dataset eval/fresh_2026_scam_eval.jsonl \
--model-id openbmb/MiniCPM5-1B \
--adapter-id build-small-hackathon/jawbreaker-minicpm5-1b-lora-v8 \
--output-prefix jawbreaker-minicpm5-1b-lora-v8-fresh2026-guarded \
--apply-safety-guard
Current Runtime Decision
The deployed Space uses openbmb/MiniCPM5-1B through Transformers on ZeroGPU with the published adapter build-small-hackathon/jawbreaker-minicpm5-1b-lora-v8.
The GGUF / llama-cpp-python path remains available for local eval and badge evidence, but it is not the primary live demo path. The live app also uses a deterministic heuristic guard so an obvious high-risk scam is not rendered as safe if the small model under-calls the risk.
If MiniCPM Space latency is unacceptable during final demo testing, Qwen/Qwen3-0.6B remains the fallback via JAWBREAKER_TRANSFORMERS_MODEL_ID, but the current judged model path is MiniCPM5-1B LoRA v8.
Current Results
MiniCPM5-1B LoRA v8:
- Adapter:
build-small-hackathon/jawbreaker-minicpm5-1b-lora-v8 - 632-case hard guarded eval:
579/632risk accuracy (91.61%),0dangerous-as-safe,0dangerous-as-needs-check,0safe-as-dangerous-or-suspicious,0unsafe action violations,0invalid predictions,0model errors
Earlier comparison evidence, MiniCPM5-1B LoRA v4:
- Adapter:
build-small-hackathon/jawbreaker-minicpm5-1b-lora-v4 - 394-case hard guarded eval:
379/394risk accuracy (96.19%),0dangerous-as-safe,0dangerous-as-needs-check,0suspicious-as-safe,0unsafe action violations,0invalid predictions,0model errors - 320-case hard guarded eval:
310/320risk accuracy (96.88%),0dangerous-as-safe,0dangerous-as-needs-check,0suspicious-as-safe,0unsafe action violations,0invalid predictions,0model errors
Running Backends
Heuristic baseline:
python3 eval/run_eval.py --backend heuristic
OpenBMB MiniCPM through Transformers:
python3 eval/run_eval.py \
--backend transformers \
--model-id openbmb/MiniCPM5-1B \
--trust-remote-code \
--dataset eval/generated_eval.jsonl
Published final v8 adapter through Transformers:
python3 eval/run_eval.py \
--backend transformers \
--model-id openbmb/MiniCPM5-1B \
--adapter-id build-small-hackathon/jawbreaker-minicpm5-1b-lora-v8 \
--trust-remote-code \
--attn-implementation eager \
--dataset eval/hard_v8_eval.jsonl \
--apply-safety-guard
Saved prediction replay:
python3 eval/run_eval.py --backend predictions --predictions eval/predictions/model.jsonl
Local GGUF model through llama-cpp-python:
python3 eval/run_eval.py \
--backend llama-cpp \
--model-path models/model.gguf \
--predictions-out eval/predictions/model.jsonl \
--json-out eval/reports/model.json