Spaces:

build-small-hackathon
/

packetcourt

Running

App Files Files Community

packetcourt / HF_ARTICLE.md

DIV-45

docs: refresh Community Article source

52f5909 verified 4 days ago

preview code

Raw

History Blame Contribute Delete

12 kB

	# PacketCourt: The packet takes the stand

	Food packets are unusually good at telling two different stories at once.

	The front has seconds to persuade: HIGH PROTEIN, MULTIGRAIN, **100%
	NATURAL, BAKED NOT FRIED**. The back carries the evidence needed to
	interpret those claims: ingredient order, nutrition basis, package size,
	licensing text, dates, and instructions that only matter after opening.

	PacketCourt is my attempt to make those two surfaces answer to each other.

	It is a phone-first Gradio app for Indian packaged-food labels. A user can add
	multiple front, back, and side-panel photographs when a packet wraps evidence
	around its dimensions. PacketCourt reads each photo independently, labels and
	merges the visible evidence, plans an investigation, performs deterministic
	calculations, and returns conservative verdicts with citations.

	It does not produce a health score. It asks a narrower question:

	> Does the evidence printed on this packet support the impression created by
	> its front?

	Try the Space: https://huggingface.co/spaces/build-small-hackathon/packetcourt

	Read the Codex-attributed source:
	https://github.com/N-45div/PacketCourt

	## The product decision that shaped everything

	An early version of the idea was a general nutrition scanner. That direction
	was broad, crowded, and difficult to trust. A single red, yellow, or green score
	would hide too many judgments:

	- Is sugar always worse than protein is good?
	- How should serving size affect a score?
	- Does an FSSAI license imply a health endorsement?
	- Can OCR uncertainty silently change the answer?

	PacketCourt therefore avoids ranking products. It audits claims against
	evidence from the same supplied packet.

	The output language is intentionally constrained:

	- `SUPPORTED BY PROVIDED LABEL`
	- `CONTRADICTED BY PROVIDED LABEL`
	- `TECHNICALLY TRUE, CONTEXT MISSING`
	- `CANNOT VERIFY`

	The phrase provided label matters. PacketCourt does not pretend that a
	photograph is a laboratory analysis or that a missing line of text does not
	exist.

	## A three-model investigation with a deterministic judge

	PacketCourt uses small models where interpretation is useful and deterministic
	code where exactness is required.

	```mermaid
	flowchart LR
	Phone["Phone or desktop<br/>multi-angle packet photos"] --> App["PacketCourt<br/>custom Gradio app"]
	App --> Vision["OpenBMB MiniCPM-V-4.6<br/>1.30B visual witness"]
	Vision --> Agent["Evidence investigation agent"]
	Agent --> Router["Fine-tuned PacketCourt router<br/>4.38M parameters"]
	Router --> Agent
	Agent --> Nemotron["NVIDIA Nemotron Mini 4B<br/>independent evidence-gap review"]
	Nemotron --> Agent
	Agent --> Judge["Deterministic evidence judge"]
	Judge --> Report["Verdicts, citations,<br/>calculations, and trace"]
	Report --> Feedback["Community Review Agent"]
	Feedback --> Queue["Public approval-gated<br/>learning queue"]
	```

	### OpenBMB MiniCPM-V-4.6: the visual witness

	The vision companion runs on ZeroGPU. It receives up to six front/side and six
	back/side images and transcribes only visibly printed evidence. PacketCourt
	labels every transcription by photo number, merges unique evidence, and skips
	exact duplicates. The front prompt focuses on claims. The back prompt preserves
	ingredients, every visible nutrition-table row and basis, net weight, FSSAI
	license text, dates, and after-opening instructions.

	The model is asked not to explain or infer. Its responsibility is to surface
	what is visible for the next stage.

	### A fine-tuned 4.38M-parameter evidence router

	Different claims require different evidence.

	- `NO ADDED SUGAR` requires ingredient inspection.
	- `HIGH PROTEIN` requires nutrition values and their measurement basis.
	- `FSSAI APPROVED` requires license evidence and a registration-versus-
	endorsement distinction.
	- `100% NATURAL` requires the safety boundary because the absolute claim cannot
	be established from packet text alone.

	I fine-tuned a tiny BERT classifier to route claims to five bounded tools:
	`ingredients`, `nutrition`, `license`, `dates`, and `refuse_absolute`.

	The first training run reached only `0.40` held-out accuracy. The random split
	did not preserve every routing class, and the dataset was too thin. I did not
	enable that checkpoint.

	After balancing the claim variants and using a stratified five-class holdout,
	the corrected checkpoint reached `1.000` on the small held-out set. That result
	is useful evidence that the routing task is learnable, not proof of broad
	generalization. Deterministic policy fallback remains available when the model
	cannot load. Real packet testing later exposed new routes such as `SUGAR FREE`,
	`REAL BADAM`, and `EXTRA CALCIUM WITH DHA`; those reviewed cases were added to
	the public training set and the router was fine-tuned again.

	Model: https://huggingface.co/build-small-hackathon/packetcourt-evidence-router

	Training data:
	https://huggingface.co/datasets/build-small-hackathon/packetcourt-router-training

	### NVIDIA Nemotron: an independent reviewer, not the judge

	After the investigation plan completes, NVIDIA
	`Nemotron-Mini-4B-Instruct` reviews the structured case for missing evidence.
	It can identify the highest-priority next action or confirm that the bounded
	investigation is complete.

	It cannot change a verdict or manufacture a required-evidence state outside
	PacketCourt's deterministic investigation. Companion responses cross a typed
	`AgentReview` boundary before they can appear in the product.

	This separation matters. A language model is useful for reviewing whether the
	investigation overlooked an evidence gap. It should not silently override
	arithmetic or invent a regulatory conclusion.

	The first Nemotron deployment also failed. I initially used
	`NVIDIA-Nemotron-3-Nano-4B-BF16`, but a real ZeroGPU probe exposed a dependency
	on a specialized Mamba CUDA runtime unavailable in the standard Gradio image.
	I switched to Nemotron Mini 4B only after the replacement completed a real
	ZeroGPU review.

	## The deterministic evidence judge

	The final verdict path is ordinary Python.

	That code:

	- detects known and meaningful previously unseen front claims;
	- extracts ingredients;
	- parses nutrition values and their declared basis, including table-style OCR
	such as `Protein (g) 12` and `Sodium \| mg \| 410`;
	- calculates whole-packet protein, sugar, sodium, and saturated fat;
	- converts total sugar into a teaspoon equivalent;
	- resolves direct and relative best-before dates;
	- extracts after-opening deadlines;
	- applies conservative claim-specific verdict rules.

	For example, when a nutrition panel declares values per `100g` and the packet
	contains `300g`, PacketCourt scales the values by exactly `3`. It does not ask a
	language model to perform that arithmetic.

	## Persuasion Gap

	Claim verification alone did not capture the most interesting part of the
	problem.

	A `HIGH PROTEIN` claim can be supported by visible protein evidence while the
	complete packet also contains substantial sugar or sodium. A multigrain claim
	can be technically true while refined flour remains the first ingredient.

	PacketCourt therefore calculates a Persuasion Gap: material context on the
	back that competes with the impression emphasized on the front.

	Examples include:

	- “Protein leads. Whole-packet sugar stays quiet.”
	- “A positive front claim competes with substantial sodium.”
	- “Grain variety is prominent. The first ingredient is refined.”
	- “Registration language can look like a health endorsement.”

	Each finding cites the exact evidence or calculation. PacketCourt still leaves
	the final decision with the user.

	## A correction-driven learning loop

	PacketCourt now includes a Community Review Agent. After an audit, a user can
	confirm the result or submit an evidence-backed correction. The review is
	bundled with the original label text, verdicts, investigation path, and
	Nemotron review in a public queue.

	Feedback does not immediately retrain production models. That would allow an
	accidental or malicious correction to poison later audits. New records begin
	as `pending_human_review` and `training_eligible: false`. Approved corrections
	can enter a versioned router-training release, followed by fine-tuning and the
	golden-case regression suite before deployment.

	Community feedback queue:
	https://huggingface.co/datasets/build-small-hackathon/packetcourt-community-feedback

	## What makes the agent bounded

	For every packet, PacketCourt emits an explicit investigation record:

	- objective;
	- selected evidence tools;
	- reason each tool was selected;
	- whether the fine-tuned router or policy fallback selected it;
	- missing-evidence requests;
	- stop reason;
	- independent Nemotron review;
	- deterministic verdicts and limitations.

	There are only two valid stopping conditions:

	1. every evidence tool required by the detected claims completed; or
	2. required evidence is missing, so PacketCourt stops and asks for it.

	The public trace dataset contains no hidden chain-of-thought. It exposes tool
	decisions, evidence outputs, calculations, and boundaries suitable for
	inspection.

	Traces:
	https://huggingface.co/datasets/build-small-hackathon/packetcourt-traces

	## Evaluation

	The current release has:

	- `20` passing unit and end-to-end integration tests;
	- `35/35` passing checks across `10` golden packet cases;
	- `10` transparent investigation traces;
	- one published real end-to-end Nemotron review trace;
	- successful live audits using the fine-tuned router and Nemotron reviewer;
	- a real public Community Review Agent record;
	- multi-angle packet-photo ingestion with duplicate removal.

	The golden cases cover contradictions, supported claims, missing context,
	whole-packet calculations, refined-grain context, FSSAI registration language,
	relative shelf-life arithmetic, and after-opening instructions.

	Golden cases:
	https://huggingface.co/datasets/build-small-hackathon/packetcourt-golden-cases

	## The interface is part of the evidence standard

	PacketCourt uses a custom responsive frontend mounted over a Gradio engine.
	The phone workflow matters because the packet is physically in the user's
	hand. Some packets place claims, ingredients, dates, directions, and nutrition
	on different wrapped panels, so the interface supports additive multi-angle
	capture rather than assuming two perfect photos. The results view shows the
	investigation path before the verdict cards, then separates persuasion gaps,
	claim findings, nutrition calculations, date evidence, machine-readable JSON,
	and the community review path.

	Uncertainty is not hidden in a tooltip. It is part of the primary result.

	## What PacketCourt refuses to claim

	PacketCourt does not declare a food:

	- healthy;
	- safe;
	- illegal;
	- fraudulent;
	- suitable for a medical condition.

	It audits only supplied packet evidence. OCR should be checked against the
	physical label. `CANNOT VERIFY` is a successful outcome when the evidence is
	insufficient.

	That refusal is not a missing feature. It is PacketCourt's standard of proof.

	## Built small

	The complete model budget is approximately `5.3B` parameters:

	- OpenBMB MiniCPM-V-4.6: `1.30B`;
	- NVIDIA Nemotron Mini: approximately `4B`;
	- fine-tuned PacketCourt router: `4.38M`.

	The main evidence judge remains deterministic and CPU-based. ZeroGPU is
	requested only for visual transcription and the independent Nemotron review.

	PacketCourt was built with OpenAI Codex as the primary coding agent. The public
	GitHub repository preserves Codex-attributed commits covering the architecture,
	tests, fine-tuning workflow, model companions, trace publication, UI, and
	deployment.

	Space: https://huggingface.co/spaces/build-small-hackathon/packetcourt

	GitHub: https://github.com/N-45div/PacketCourt

	Model: https://huggingface.co/build-small-hackathon/packetcourt-evidence-router

	Traces: https://huggingface.co/datasets/build-small-hackathon/packetcourt-traces