Photon-6B β€” a local model that knows when it's inventing an entity

Lucidia family Β· Photon line Β· honesty.tools

Scope, read first. Photon catches entities that do not exist at all (made-up drugs, companies, people, papers, case citations). It is at chance on everything else β€” fluent lies about real entities and confident reasoning errors. It does not make the model answer better; it makes it abstain better, with a receipt. Route real-entity lies and reasoning errors to a separate verifier.

Photon-6B is a package over stock models, not a fine-tune. It is Qwen2.5-3B-Instruct, unchanged β€” same weights, same answers, same capability β€” wrapped by an honesty layer. No LoRA, no merge, nothing trained. The "6" is the param sum of the two stock models it orchestrates (a 3B answerer + a 3.8B independent lens). It adds zero capability; the novelty is the architecture of the check.

How it works

  1. A cross-family independent lens β€” Phi-3.5-mini-instruct, a different model family from the answerer. A verifier from the same mind only confirms; a different one checks. If either model refuses in its own words, Photon abstains. This is the universal spine β€” it holds out-of-distribution and on private entities.
  2. The Grounded Atlas β€” Lucidia's offline existence index (~16.7M real-entity names) baked into the package: no network, no API call, swappable for your own domain (compiled from public article-title corpora). Presence is a prior, not proof; absence is not proof of non-existence.
  3. A signed receipt β€” every verdict ships an ed25519-signed record + its null (what the detector reads on shuffled labels, β‰ˆ0.5). Auditable, not asserted.

The honest numbers

Measured on a HalluLens-protocol battery (frozen thresholds, ships its null):

setting fabrication recall over-abstain
dual-family hedge + Grounded Atlas (Ollama tier) 88% 6.3%
the 3B answerer's own refusal of fabricated entities 75.8% β€”

The dual-family hedge spine barely degrades at a third the footprint β€” the 3B answerer refuses fabricated entities as often as the 14B, and the package recall (88%) is on par with Photon-17B's (87.5%) on the same battery. ~6 GB total, runs on a laptop. (The fuller battery suite β€” held-out, enterprise β€” is landing; this is the headline-benchmark + live-tested figure.)

Never read these as more than they are: the in-distribution figures isolate the mechanism; plan a deployment around the out-of-distribution one. The numbers are measured on held-out batteries with frozen thresholds, and each ships its null. Full methodology (including where the layer is weakest) is open at honesty.tools.

Scope & limits (load-bearing, not fine print)

  • Catches: entities (people, places, papers, works, businesses, species, drug/company/case names) that do not exist at all, and off-distribution inputs.
  • Blind to: fluent lies about real entities, and confident reasoning errors β€” geometry charts familiarity, not truth. Complementary to semantic-entropy / SelfCheckGPT, which own that axis.
  • Per-model calibration is a hard dependency; swapping the answerer needs re-calibration.
  • Not-in-atlas entities: on real entities outside the Grounded Atlas (private/enterprise), the reliable signal is the model's own refusal (the hedge leg); swap a domain index to ground them.

Run it (Ollama)

ollama pull qwen2.5:3b-instruct
ollama pull phi3.5
python photon_ollama.py --answerer qwen2.5:3b-instruct --lens phi3.5 \
       --bloom grounded_atlas.bloom "Tell me about the medicine Velodose"
# -> off_map: true, route: "dual-family refusal"   (a fabricated drug)
python photon_ollama.py ... "Tell me about Marie Curie"
# -> off_map: false, route: "grounded"             (a real, atlas-covered entity)

grounded_atlas.bloom (~33 MB) is the offline existence index, included. The dual-family hedge + atlas is the universal spine (words-only, runs in Ollama/llama.cpp). A stronger Python-served tier adds a residual-stream probe (+recall on atlas-covered domains) β€” see honesty.tools.

Provenance

Built on stock Qwen2.5-3B-Instruct + Phi-3.5-mini-instruct + the Grounded Atlas. No weights were modified, merged, or trained. Methodology, batteries, and the where-it's-weakest writeups are open at honesty.tools.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support