A newer version of the Gradio SDK is available: 6.15.2
metadata
title: OppaiOracle V1.1 (CPU)
emoji: ⚡
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
pinned: false
short_description: Anime ViT tagger (V1.1, 448×448, CPU)
models:
- Grio43/OppaiOracle
OppaiOracle V1.1 — CPU Space
Public inference UI for the V1.1 (448×448) anime tagger checkpoint that lives in the Grio43/OppaiOracle model repo, running on CPU hardware. For faster GPU-backed inference, see Grio43/OppaiOracle.
What this Space does
- Downloads
V1.1_onnx/model.onnxandV1.1_onnx/vocabulary.jsonfrom the model repo on first launch (cached afterwards). - Letterboxes the uploaded image to 448×448, normalizes with mean/std = 0.5, runs the ONNX
graph with
onnxruntime(CPU EP), and returns the top tags above the chosen threshold. - Sigmoid is already inside the ONNX graph — the output is a probability vector over 19,294 tags.
Performance notes
- Each prediction takes ~10–30 s on the standard HF CPU tier. The V1.1 ViT is ~250M params at 448² input — there is no way around the cost on CPU.
- ORT graph optimization is set to
BASIC(with a fallback toDISABLE_ALL) becauseENABLE_ALLsynthesizes aMemcpyToHostnode on the boolpadding_maskinput that the HF CPU build of ORT rejects withNOT_IMPLEMENTED.
Tag thresholds
- Default threshold is 0.50 (broader recall for browsing).
- The macro PR break-even point measured on the val set for V1.1 is ≈ 0.760 — set the slider there for the calibrated precision = recall operating point.
- Per-tag thresholds are available in
V1.1_onnx/pr_thresholds.jsonin the model repo for users who want to apply them externally.
Known noise patterns (read the model card first)
The full model card on the model repo documents these in detail. In short:
- Missing tags are the dominant noise mode. Many present concepts are simply unlabeled in the training corpus, so low predicted scores are less informative than they look.
- Color tags (eye/hair/general) leak between perceptual neighbours.
- Hair length boundaries (
long_hair/very_long_hair) are inherently noisy in the source data. - Neckwear (
bow/bowtie/ribbon/ascot/necktie) is routinely confused at the source. - Treat any prediction as a suggestion to inspect, not a final answer.