Grio43 commited on
Commit
ca45eee
·
verified ·
1 Parent(s): 1b322b3

Add library_name: oppaioracle for HF download-stats registration

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
  license: apache-2.0
 
3
  pipeline_tag: image-classification
4
  language:
5
  - en
@@ -19,6 +20,12 @@ tags:
19
  ---
20
 
21
 
 
 
 
 
 
 
22
  ## TL;DR
23
 
24
  A multi-label anime tagger trained from scratch on a \~5.9M image dataset that received a targeted cleaning and vocabulary-expansion pass before training. The corrections touched roughly **1.3M tags** — large in absolute terms, but only on the order of **\~3% of all tags** in the corpus, so this is best described as a *targeted* cleaning rather than a heavy one. The pass was deliberately weighted toward **low-frequency tags**, which is where mislabels and missing labels hurt a tagger the most. On my evaluation set the model achieves the best precision-equals-recall point and a good mAP relative to comparable open tagger checkpoints, but the underlying training data still contains category-level noise that no amount of training would have erased. **All predictions should be human-reviewed before they are trusted.**
@@ -225,7 +232,6 @@ V1 ships with the noise it ships with. V2 is where I plan to do something about
225
 
226
  - **SmilingWolf** for the ViT v3 tagger, which made the initial cleaning pass tractable. None of this would have been feasible without an existing strong tagger to use as a second opinion.
227
  - The broader anime-tagger open-source community for the public tag corpora and prior model checkpoints I compared against.
228
- - deepghs/danbooru2024 dataset
229
 
230
  ---
231
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: oppaioracle
4
  pipeline_tag: image-classification
5
  language:
6
  - en
 
20
  ---
21
 
22
 
23
+ # OppaiOracle — Hugging Face Release Draft
24
+
25
+ > Draft release notes / model card for the first public OppaiOracle checkpoint. Intended audience: people considering using this model for anime/illustration tagging. Tone: direct about what works, direct about what doesn't.
26
+
27
+ ---
28
+
29
  ## TL;DR
30
 
31
  A multi-label anime tagger trained from scratch on a \~5.9M image dataset that received a targeted cleaning and vocabulary-expansion pass before training. The corrections touched roughly **1.3M tags** — large in absolute terms, but only on the order of **\~3% of all tags** in the corpus, so this is best described as a *targeted* cleaning rather than a heavy one. The pass was deliberately weighted toward **low-frequency tags**, which is where mislabels and missing labels hurt a tagger the most. On my evaluation set the model achieves the best precision-equals-recall point and a good mAP relative to comparable open tagger checkpoints, but the underlying training data still contains category-level noise that no amount of training would have erased. **All predictions should be human-reviewed before they are trusted.**
 
232
 
233
  - **SmilingWolf** for the ViT v3 tagger, which made the initial cleaning pass tractable. None of this would have been feasible without an existing strong tagger to use as a second opinion.
234
  - The broader anime-tagger open-source community for the public tag corpora and prior model checkpoints I compared against.
 
235
 
236
  ---
237