Spaces:

Grio43
/

OppaiCPU

Running

App Files Files Community

OppaiCPU / README.md

Grio43

Initial CPU build for V1.1 (448x448)

a1baa27 verified 22 days ago

preview code

raw

history blame contribute delete

2.48 kB

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

metadata

title: OppaiOracle V1.1 (CPU)
emoji: ⚡
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
pinned: false
short_description: Anime ViT tagger (V1.1, 448×448, CPU)
models:
  - Grio43/OppaiOracle

OppaiOracle V1.1 — CPU Space

Public inference UI for the V1.1 (448×448) anime tagger checkpoint that lives in the Grio43/OppaiOracle model repo, running on CPU hardware. For faster GPU-backed inference, see Grio43/OppaiOracle.

What this Space does

Downloads V1.1_onnx/model.onnx and V1.1_onnx/vocabulary.json from the model repo on first launch (cached afterwards).
Letterboxes the uploaded image to 448×448, normalizes with mean/std = 0.5, runs the ONNX graph with onnxruntime (CPU EP), and returns the top tags above the chosen threshold.
Sigmoid is already inside the ONNX graph — the output is a probability vector over 19,294 tags.

Performance notes

Each prediction takes ~10–30 s on the standard HF CPU tier. The V1.1 ViT is ~250M params at 448² input — there is no way around the cost on CPU.
ORT graph optimization is set to BASIC (with a fallback to DISABLE_ALL) because ENABLE_ALL synthesizes a MemcpyToHost node on the bool padding_mask input that the HF CPU build of ORT rejects with NOT_IMPLEMENTED.

Tag thresholds

Default threshold is 0.50 (broader recall for browsing).
The macro PR break-even point measured on the val set for V1.1 is ≈ 0.760 — set the slider there for the calibrated precision = recall operating point.
Per-tag thresholds are available in V1.1_onnx/pr_thresholds.json in the model repo for users who want to apply them externally.

Known noise patterns (read the model card first)

The full model card on the model repo documents these in detail. In short:

Missing tags are the dominant noise mode. Many present concepts are simply unlabeled in the training corpus, so low predicted scores are less informative than they look.
Color tags (eye/hair/general) leak between perceptual neighbours.
Hair length boundaries (long_hair / very_long_hair) are inherently noisy in the source data.
Neckwear (bow / bowtie / ribbon / ascot / necktie) is routinely confused at the source.
Treat any prediction as a suggestion to inspect, not a final answer.