OppaiCPU / README.md
Grio43's picture
Initial CPU build for V1.1 (448x448)
a1baa27 verified
---
title: OppaiOracle V1.1 (CPU)
emoji:
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
pinned: false
short_description: Anime ViT tagger (V1.1, 448×448, CPU)
models:
- Grio43/OppaiOracle
---
# OppaiOracle V1.1 — CPU Space
Public inference UI for the **V1.1** (448×448) anime tagger checkpoint that lives in the
[Grio43/OppaiOracle](https://huggingface.co/Grio43/OppaiOracle) model repo, running on CPU
hardware. For faster GPU-backed inference, see [Grio43/OppaiOracle](https://huggingface.co/spaces/Grio43/OppaiOracle).
## What this Space does
- Downloads `V1.1_onnx/model.onnx` and `V1.1_onnx/vocabulary.json` from the model repo on
first launch (cached afterwards).
- Letterboxes the uploaded image to 448×448, normalizes with mean/std = 0.5, runs the ONNX
graph with `onnxruntime` (CPU EP), and returns the top tags above the chosen threshold.
- Sigmoid is already inside the ONNX graph — the output is a probability vector over 19,294
tags.
## Performance notes
- Each prediction takes **~10–30 s** on the standard HF CPU tier. The V1.1 ViT is ~250M params
at 448² input — there is no way around the cost on CPU.
- ORT graph optimization is set to `BASIC` (with a fallback to `DISABLE_ALL`) because
`ENABLE_ALL` synthesizes a `MemcpyToHost` node on the bool `padding_mask` input that the
HF CPU build of ORT rejects with `NOT_IMPLEMENTED`.
## Tag thresholds
- Default threshold is **0.50** (broader recall for browsing).
- The macro PR break-even point measured on the val set for V1.1 is **≈ 0.760** — set the
slider there for the calibrated precision = recall operating point.
- Per-tag thresholds are available in `V1.1_onnx/pr_thresholds.json` in the model repo for
users who want to apply them externally.
## Known noise patterns (read the model card first)
The full model card on the model repo documents these in detail. In short:
- **Missing tags are the dominant noise mode.** Many present concepts are simply unlabeled
in the training corpus, so low predicted scores are less informative than they look.
- **Color tags** (eye/hair/general) leak between perceptual neighbours.
- **Hair length** boundaries (`long_hair` / `very_long_hair`) are inherently noisy in the
source data.
- **Neckwear** (`bow` / `bowtie` / `ribbon` / `ascot` / `necktie`) is routinely confused at
the source.
- Treat any prediction as a *suggestion to inspect*, not a final answer.