Spaces:

Grio43
/

OppaiCPU

Running

App Files Files Community

OppaiCPU / README.md

Grio43

Initial CPU build for V1.1 (448x448)

a1baa27 verified 22 days ago

preview code

raw

history blame contribute delete

2.48 kB

	---
	title: OppaiOracle V1.1 (CPU)
	emoji: ⚡
	colorFrom: pink
	colorTo: purple
	sdk: gradio
	sdk_version: 6.14.0
	python_version: '3.12'
	app_file: app.py
	pinned: false
	short_description: Anime ViT tagger (V1.1, 448×448, CPU)
	models:
	- Grio43/OppaiOracle
	---

	# OppaiOracle V1.1 — CPU Space

	Public inference UI for the V1.1 (448×448) anime tagger checkpoint that lives in the
	[Grio43/OppaiOracle](https://huggingface.co/Grio43/OppaiOracle) model repo, running on CPU
	hardware. For faster GPU-backed inference, see [Grio43/OppaiOracle](https://huggingface.co/spaces/Grio43/OppaiOracle).

	## What this Space does

	- Downloads `V1.1_onnx/model.onnx` and `V1.1_onnx/vocabulary.json` from the model repo on
	first launch (cached afterwards).
	- Letterboxes the uploaded image to 448×448, normalizes with mean/std = 0.5, runs the ONNX
	graph with `onnxruntime` (CPU EP), and returns the top tags above the chosen threshold.
	- Sigmoid is already inside the ONNX graph — the output is a probability vector over 19,294
	tags.

	## Performance notes

	- Each prediction takes ~10–30 s on the standard HF CPU tier. The V1.1 ViT is ~250M params
	at 448² input — there is no way around the cost on CPU.
	- ORT graph optimization is set to `BASIC` (with a fallback to `DISABLE_ALL`) because
	`ENABLE_ALL` synthesizes a `MemcpyToHost` node on the bool `padding_mask` input that the
	HF CPU build of ORT rejects with `NOT_IMPLEMENTED`.

	## Tag thresholds

	- Default threshold is 0.50 (broader recall for browsing).
	- The macro PR break-even point measured on the val set for V1.1 is ≈ 0.760 — set the
	slider there for the calibrated precision = recall operating point.
	- Per-tag thresholds are available in `V1.1_onnx/pr_thresholds.json` in the model repo for
	users who want to apply them externally.

	## Known noise patterns (read the model card first)

	The full model card on the model repo documents these in detail. In short:

	- Missing tags are the dominant noise mode. Many present concepts are simply unlabeled
	in the training corpus, so low predicted scores are less informative than they look.
	- Color tags (eye/hair/general) leak between perceptual neighbours.
	- Hair length boundaries (`long_hair` / `very_long_hair`) are inherently noisy in the
	source data.
	- Neckwear (`bow` / `bowtie` / `ribbon` / `ascot` / `necktie`) is routinely confused at
	the source.
	- Treat any prediction as a suggestion to inspect, not a final answer.