Add library_name: oppaioracle for HF download-stats registration
Browse files
README.md
CHANGED
|
@@ -1,5 +1,6 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
pipeline_tag: image-classification
|
| 4 |
language:
|
| 5 |
- en
|
|
@@ -19,6 +20,12 @@ tags:
|
|
| 19 |
---
|
| 20 |
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## TL;DR
|
| 23 |
|
| 24 |
A multi-label anime tagger trained from scratch on a \~5.9M image dataset that received a targeted cleaning and vocabulary-expansion pass before training. The corrections touched roughly **1.3M tags** — large in absolute terms, but only on the order of **\~3% of all tags** in the corpus, so this is best described as a *targeted* cleaning rather than a heavy one. The pass was deliberately weighted toward **low-frequency tags**, which is where mislabels and missing labels hurt a tagger the most. On my evaluation set the model achieves the best precision-equals-recall point and a good mAP relative to comparable open tagger checkpoints, but the underlying training data still contains category-level noise that no amount of training would have erased. **All predictions should be human-reviewed before they are trusted.**
|
|
@@ -225,7 +232,6 @@ V1 ships with the noise it ships with. V2 is where I plan to do something about
|
|
| 225 |
|
| 226 |
- **SmilingWolf** for the ViT v3 tagger, which made the initial cleaning pass tractable. None of this would have been feasible without an existing strong tagger to use as a second opinion.
|
| 227 |
- The broader anime-tagger open-source community for the public tag corpora and prior model checkpoints I compared against.
|
| 228 |
-
- deepghs/danbooru2024 dataset
|
| 229 |
|
| 230 |
---
|
| 231 |
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: oppaioracle
|
| 4 |
pipeline_tag: image-classification
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
| 20 |
---
|
| 21 |
|
| 22 |
|
| 23 |
+
# OppaiOracle — Hugging Face Release Draft
|
| 24 |
+
|
| 25 |
+
> Draft release notes / model card for the first public OppaiOracle checkpoint. Intended audience: people considering using this model for anime/illustration tagging. Tone: direct about what works, direct about what doesn't.
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
## TL;DR
|
| 30 |
|
| 31 |
A multi-label anime tagger trained from scratch on a \~5.9M image dataset that received a targeted cleaning and vocabulary-expansion pass before training. The corrections touched roughly **1.3M tags** — large in absolute terms, but only on the order of **\~3% of all tags** in the corpus, so this is best described as a *targeted* cleaning rather than a heavy one. The pass was deliberately weighted toward **low-frequency tags**, which is where mislabels and missing labels hurt a tagger the most. On my evaluation set the model achieves the best precision-equals-recall point and a good mAP relative to comparable open tagger checkpoints, but the underlying training data still contains category-level noise that no amount of training would have erased. **All predictions should be human-reviewed before they are trusted.**
|
|
|
|
| 232 |
|
| 233 |
- **SmilingWolf** for the ViT v3 tagger, which made the initial cleaning pass tractable. None of this would have been feasible without an existing strong tagger to use as a second opinion.
|
| 234 |
- The broader anime-tagger open-source community for the public tag corpora and prior model checkpoints I compared against.
|
|
|
|
| 235 |
|
| 236 |
---
|
| 237 |
|