animescore / README.md
nonmetal's picture
AnimeScore release
eb34860
|
Raw
History Blame Contribute Delete
2.24 kB
---
language:
- ja
license: mit
tags:
- audio
- speech
- preference
- anime
library_name: transformers
pipeline_tag: audio-classification
---
# AnimeScore
Try the interactive demo: [AnimeScore Demo Space](https://huggingface.co/spaces/spellbrush/animescore-demo).
A learned scorer for anime-like speech style.
Given an audio clip, it returns a scalar score; higher is more anime-like.
This is the official Huggingface model repository for the paper "[AnimeScore: A Preference-Based Dataset and Framework for Evaluating Anime-Like Speech Style](https://arxiv.org/abs/2603.11482)".
For more details, please visit our [GitHub Repository](https://github.com/sizigi/animescore).
## Checkpoint
We release the HuBERT-based model, which achieved the best performance among the backbones we evaluated (pairwise accuracy 82.4%, AUC 0.908).
| File | Size | Notes |
|---|---:|---|
| `model.safetensors` | ~9 MB | Released head weights |
| `config.json` | — | Model config |
| `modeling_animescore.py` | — | Custom modeling code (loaded via `trust_remote_code=True`) |
## How to use
```bash
pip install -r requirements.txt
```
```python
import torch, torchaudio
from transformers import AutoModel
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModel.from_pretrained(
"spellbrush/animescore",
trust_remote_code=True,
).eval().to(device)
wav, sr = torchaudio.load("sample.wav")
if wav.size(0) > 1:
wav = wav.mean(0, keepdim=True) # mono
if sr != 16000:
wav = torchaudio.functional.resample(wav, sr, 16000)
with torch.no_grad():
s = model.score(wav.to(device)).item()
print(f"AnimeScore: {s:.3f}")
```
Pairwise probability:
```python
sa = model.score(wav_a.to(device))
sb = model.score(wav_b.to(device))
p_a_gt_b = torch.sigmoid(sa - sb).item()
```
CLI: `python example_inference.py --ckpt . --wav sample.wav`
or deploy this directory as a HuggingFace Space (SDK = `gradio`).
## Citation
```bibtex
@inproceedings{park2026animescore,
title = {AnimeScore: A Preference-Based Dataset and Framework for
Evaluating Anime-Like Speech Style},
author = {Park, Joonyong and Li, Jerry},
booktitle = {Interspeech},
year = {2026}
}
```
## License
MIT License.