Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
| # Android Export and Runtime / Android 导出与运行时 | |
| AniFileBERT is used by MiruPlay as a Git submodule at `tools/anime_parser`. | |
| AniFileBERT 在 MiruPlay 中作为 `tools/anime_parser` 子模块使用。 | |
| ## Export / 导出 | |
| From this repository root, export the published root checkpoint: | |
| 在本仓库根目录导出当前发布 checkpoint: | |
| ```powershell | |
| uv sync | |
| uv run python -m tools.export_onnx --model-dir . --max-length 128 --android-assets-dir ../../scraper/src/main/assets/anime_parser | |
| ``` | |
| The exporter writes: | |
| 导出器会写入: | |
| - `exports/anime_filename_parser.onnx` | |
| - `exports/anime_filename_parser.metadata.json` | |
| - `scraper/src/main/assets/anime_parser/anime_filename_parser.onnx` | |
| - `scraper/src/main/assets/anime_parser/vocab.json` | |
| - `scraper/src/main/assets/anime_parser/config.json` | |
| ## Static Graph Shape / 静态图 Shape | |
| ```text | |
| input_ids int64[1,128] | |
| attention_mask int64[1,128] | |
| logits float32[1,128,15] | |
| ``` | |
| The current export is verified against PyTorch, with max absolute logits | |
| difference recorded in `exports/anime_filename_parser.metadata.json`. | |
| 当前导出会和 PyTorch 做数值对齐,最大 logits 误差记录在 | |
| `exports/anime_filename_parser.metadata.json`。 | |
| ## Local ONNX Smoke Test / 本地 ONNX 冒烟测试 | |
| ```powershell | |
| uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4" | |
| ``` | |
| Expected fields / 期望字段: | |
| ```text | |
| title=神印王座, episode=200, group=GM-Team, resolution=1080P, source=GB | |
| ``` | |
| Special-code example / 特典编号示例: | |
| ```powershell | |
| uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv" | |
| ``` | |
| Expected fields / 期望字段: | |
| ```text | |
| title=Shinsekai Yori, episode=null, group=YYDM&VCB-Studio, special=NCED02 | |
| ``` | |
| ## Runtime Contract / 运行时契约 | |
| The ONNX graph returns token logits only. Android must implement the same: | |
| ONNX 图只返回 token logits。Android 必须实现同一套: | |
| - custom character tokenizer / 自定义字符 tokenizer | |
| - token id lookup from `vocab.json` / 使用 `vocab.json` 查 token id | |
| - fixed-length padding to 128 / padding 到固定长度 128 | |
| - constrained BIO decoding / 约束 BIO 解码 | |
| - field aggregation / 字段聚合 | |
| - thin string/number normalization / 轻量字符串和数字规范化 | |
| The Android runtime implementation lives in MiruPlay: | |
| Android 运行时实现位于 MiruPlay: | |
| ```text | |
| scraper/src/main/kotlin/com/miruplay/tv/scraper/filename/AnimeFilenameParser.kt | |
| ``` | |
| The app exposes it through `FilenameMetadataParser` in `core:model`. During a | |
| scan, `ScanCoordinator` passes that parser into `VideoDirectoryClassifier`. | |
| 应用通过 `core:model` 的 `FilenameMetadataParser` 暴露解析能力。扫描时, | |
| `ScanCoordinator` 会把解析器传给 `VideoDirectoryClassifier`。 | |
| ## Asset Update Rule / 资产更新规则 | |
| When updating the parser, keep these files in sync: | |
| 更新解析器时,以下文件必须同步: | |
| ```text | |
| anime_filename_parser.onnx | |
| vocab.json | |
| config.json | |
| ``` | |
| Do not update only the ONNX file. Token ids, label ids, and max length are part | |
| of the runtime contract. | |
| 不要只更新 ONNX。token id、label id 和 max length 都是运行时契约的一部分。 | |
| ## More Details / 更多说明 | |
| See [`onnx.md`](onnx.md) for a minimal Python ONNX Runtime reference. | |
| 最小 Python ONNX Runtime 参考见 [`onnx.md`](onnx.md)。 | |