--- title: OppaiOracle V1.1 (CPU) emoji: ⚡ colorFrom: pink colorTo: purple sdk: gradio sdk_version: 6.14.0 python_version: '3.12' app_file: app.py pinned: false short_description: Anime ViT tagger (V1.1, 448×448, CPU) models: - Grio43/OppaiOracle --- # OppaiOracle V1.1 — CPU Space Public inference UI for the **V1.1** (448×448) anime tagger checkpoint that lives in the [Grio43/OppaiOracle](https://huggingface.co/Grio43/OppaiOracle) model repo, running on CPU hardware. For faster GPU-backed inference, see [Grio43/OppaiOracle](https://huggingface.co/spaces/Grio43/OppaiOracle). ## What this Space does - Downloads `V1.1_onnx/model.onnx` and `V1.1_onnx/vocabulary.json` from the model repo on first launch (cached afterwards). - Letterboxes the uploaded image to 448×448, normalizes with mean/std = 0.5, runs the ONNX graph with `onnxruntime` (CPU EP), and returns the top tags above the chosen threshold. - Sigmoid is already inside the ONNX graph — the output is a probability vector over 19,294 tags. ## Performance notes - Each prediction takes **~10–30 s** on the standard HF CPU tier. The V1.1 ViT is ~250M params at 448² input — there is no way around the cost on CPU. - ORT graph optimization is set to `BASIC` (with a fallback to `DISABLE_ALL`) because `ENABLE_ALL` synthesizes a `MemcpyToHost` node on the bool `padding_mask` input that the HF CPU build of ORT rejects with `NOT_IMPLEMENTED`. ## Tag thresholds - Default threshold is **0.50** (broader recall for browsing). - The macro PR break-even point measured on the val set for V1.1 is **≈ 0.760** — set the slider there for the calibrated precision = recall operating point. - Per-tag thresholds are available in `V1.1_onnx/pr_thresholds.json` in the model repo for users who want to apply them externally. ## Known noise patterns (read the model card first) The full model card on the model repo documents these in detail. In short: - **Missing tags are the dominant noise mode.** Many present concepts are simply unlabeled in the training corpus, so low predicted scores are less informative than they look. - **Color tags** (eye/hair/general) leak between perceptual neighbours. - **Hair length** boundaries (`long_hair` / `very_long_hair`) are inherently noisy in the source data. - **Neckwear** (`bow` / `bowtie` / `ribbon` / `ascot` / `necktie`) is routinely confused at the source. - Treat any prediction as a *suggestion to inspect*, not a final answer.