---
title: OppaiOracle V1.1 (CPU)
emoji: ⚡
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
pinned: false
short_description: Anime ViT tagger (V1.1, 448×448, CPU)
models:
  - Grio43/OppaiOracle
---

# OppaiOracle V1.1 — CPU Space

Public inference UI for the **V1.1** (448×448) anime tagger checkpoint that lives in the
[Grio43/OppaiOracle](https://huggingface.co/Grio43/OppaiOracle) model repo, running on CPU
hardware. For faster GPU-backed inference, see [Grio43/OppaiOracle](https://huggingface.co/spaces/Grio43/OppaiOracle).

## What this Space does

- Downloads `V1.1_onnx/model.onnx` and `V1.1_onnx/vocabulary.json` from the model repo on
  first launch (cached afterwards).
- Letterboxes the uploaded image to 448×448, normalizes with mean/std = 0.5, runs the ONNX
  graph with `onnxruntime` (CPU EP), and returns the top tags above the chosen threshold.
- Sigmoid is already inside the ONNX graph — the output is a probability vector over 19,294
  tags.

## Performance notes

- Each prediction takes **~10–30 s** on the standard HF CPU tier. The V1.1 ViT is ~250M params
  at 448² input — there is no way around the cost on CPU.
- ORT graph optimization is set to `BASIC` (with a fallback to `DISABLE_ALL`) because
  `ENABLE_ALL` synthesizes a `MemcpyToHost` node on the bool `padding_mask` input that the
  HF CPU build of ORT rejects with `NOT_IMPLEMENTED`.

## Tag thresholds

- Default threshold is **0.50** (broader recall for browsing).
- The macro PR break-even point measured on the val set for V1.1 is **≈ 0.760** — set the
  slider there for the calibrated precision = recall operating point.
- Per-tag thresholds are available in `V1.1_onnx/pr_thresholds.json` in the model repo for
  users who want to apply them externally.

## Known noise patterns (read the model card first)

The full model card on the model repo documents these in detail. In short:

- **Missing tags are the dominant noise mode.** Many present concepts are simply unlabeled
  in the training corpus, so low predicted scores are less informative than they look.
- **Color tags** (eye/hair/general) leak between perceptual neighbours.
- **Hair length** boundaries (`long_hair` / `very_long_hair`) are inherently noisy in the
  source data.
- **Neckwear** (`bow` / `bowtie` / `ribbon` / `ascot` / `necktie`) is routinely confused at
  the source.
- Treat any prediction as a *suggestion to inspect*, not a final answer.