Audio Classification
Transformers
Safetensors
Japanese
animescore_ranknet
image-feature-extraction
audio
speech
preference
anime
custom_code
Instructions to use spellbrush/animescore with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use spellbrush/animescore with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="spellbrush/animescore", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("spellbrush/animescore", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - ja | |
| license: mit | |
| tags: | |
| - audio | |
| - speech | |
| - preference | |
| - anime | |
| library_name: transformers | |
| pipeline_tag: audio-classification | |
| # AnimeScore | |
| Try the interactive demo: [AnimeScore Demo Space](https://huggingface.co/spaces/spellbrush/animescore-demo). | |
| A learned scorer for anime-like speech style. | |
| Given an audio clip, it returns a scalar score; higher is more anime-like. | |
| This is the official Huggingface model repository for the paper "[AnimeScore: A Preference-Based Dataset and Framework for Evaluating Anime-Like Speech Style](https://arxiv.org/abs/2603.11482)". | |
| For more details, please visit our [GitHub Repository](https://github.com/sizigi/animescore). | |
| ## Checkpoint | |
| We release the HuBERT-based model, which achieved the best performance among the backbones we evaluated (pairwise accuracy 82.4%, AUC 0.908). | |
| | File | Size | Notes | | |
| |---|---:|---| | |
| | `model.safetensors` | ~9 MB | Released head weights | | |
| | `config.json` | — | Model config | | |
| | `modeling_animescore.py` | — | Custom modeling code (loaded via `trust_remote_code=True`) | | |
| ## How to use | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ```python | |
| import torch, torchaudio | |
| from transformers import AutoModel | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| model = AutoModel.from_pretrained( | |
| "spellbrush/animescore", | |
| trust_remote_code=True, | |
| ).eval().to(device) | |
| wav, sr = torchaudio.load("sample.wav") | |
| if wav.size(0) > 1: | |
| wav = wav.mean(0, keepdim=True) # mono | |
| if sr != 16000: | |
| wav = torchaudio.functional.resample(wav, sr, 16000) | |
| with torch.no_grad(): | |
| s = model.score(wav.to(device)).item() | |
| print(f"AnimeScore: {s:.3f}") | |
| ``` | |
| Pairwise probability: | |
| ```python | |
| sa = model.score(wav_a.to(device)) | |
| sb = model.score(wav_b.to(device)) | |
| p_a_gt_b = torch.sigmoid(sa - sb).item() | |
| ``` | |
| CLI: `python example_inference.py --ckpt . --wav sample.wav` | |
| or deploy this directory as a HuggingFace Space (SDK = `gradio`). | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{park2026animescore, | |
| title = {AnimeScore: A Preference-Based Dataset and Framework for | |
| Evaluating Anime-Like Speech Style}, | |
| author = {Park, Joonyong and Li, Jerry}, | |
| booktitle = {Interspeech}, | |
| year = {2026} | |
| } | |
| ``` | |
| ## License | |
| MIT License. |