Photon-6B β a local model that knows when it's inventing an entity
Lucidia family Β· Photon line Β· honesty.tools
Scope, read first. Photon catches entities that do not exist at all (made-up drugs, companies, people, papers, case citations). It is at chance on everything else β fluent lies about real entities and confident reasoning errors. It does not make the model answer better; it makes it abstain better, with a receipt. Route real-entity lies and reasoning errors to a separate verifier.
Photon-6B is a package over stock models, not a fine-tune. It is Qwen2.5-3B-Instruct, unchanged β same weights, same answers, same capability β wrapped by an honesty layer. No LoRA, no merge, nothing trained. The "6" is the param sum of the two stock models it orchestrates (a 3B answerer + a 3.8B independent lens). It adds zero capability; the novelty is the architecture of the check.
How it works
- A cross-family independent lens β Phi-3.5-mini-instruct, a different model family from the answerer. A verifier from the same mind only confirms; a different one checks. If either model refuses in its own words, Photon abstains. This is the universal spine β it holds out-of-distribution and on private entities.
- The Grounded Atlas β Lucidia's offline existence index (~16.7M real-entity names) baked into the package: no network, no API call, swappable for your own domain (compiled from public article-title corpora). Presence is a prior, not proof; absence is not proof of non-existence.
- A signed receipt β every verdict ships an ed25519-signed record + its null (what the detector reads on shuffled labels, β0.5). Auditable, not asserted.
The honest numbers
Measured on a HalluLens-protocol battery (frozen thresholds, ships its null):
| setting | fabrication recall | over-abstain |
|---|---|---|
| dual-family hedge + Grounded Atlas (Ollama tier) | 88% | 6.3% |
| the 3B answerer's own refusal of fabricated entities | 75.8% | β |
The dual-family hedge spine barely degrades at a third the footprint β the 3B answerer refuses fabricated entities as often as the 14B, and the package recall (88%) is on par with Photon-17B's (87.5%) on the same battery. ~6 GB total, runs on a laptop. (The fuller battery suite β held-out, enterprise β is landing; this is the headline-benchmark + live-tested figure.)
Never read these as more than they are: the in-distribution figures isolate the mechanism; plan a deployment around the out-of-distribution one. The numbers are measured on held-out batteries with frozen thresholds, and each ships its null. Full methodology (including where the layer is weakest) is open at honesty.tools.
Scope & limits (load-bearing, not fine print)
- Catches: entities (people, places, papers, works, businesses, species, drug/company/case names) that do not exist at all, and off-distribution inputs.
- Blind to: fluent lies about real entities, and confident reasoning errors β geometry charts familiarity, not truth. Complementary to semantic-entropy / SelfCheckGPT, which own that axis.
- Per-model calibration is a hard dependency; swapping the answerer needs re-calibration.
- Not-in-atlas entities: on real entities outside the Grounded Atlas (private/enterprise), the reliable signal is the model's own refusal (the hedge leg); swap a domain index to ground them.
Run it (Ollama)
ollama pull qwen2.5:3b-instruct
ollama pull phi3.5
python photon_ollama.py --answerer qwen2.5:3b-instruct --lens phi3.5 \
--bloom grounded_atlas.bloom "Tell me about the medicine Velodose"
# -> off_map: true, route: "dual-family refusal" (a fabricated drug)
python photon_ollama.py ... "Tell me about Marie Curie"
# -> off_map: false, route: "grounded" (a real, atlas-covered entity)
grounded_atlas.bloom (~33 MB) is the offline existence index, included. The dual-family hedge + atlas is the
universal spine (words-only, runs in Ollama/llama.cpp). A stronger Python-served tier adds a residual-stream
probe (+recall on atlas-covered domains) β see honesty.tools.
Provenance
Built on stock Qwen2.5-3B-Instruct + Phi-3.5-mini-instruct + the Grounded Atlas. No weights were modified, merged, or trained. Methodology, batteries, and the where-it's-weakest writeups are open at honesty.tools.