Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
Android Export and Runtime / Android 导出与运行时
AniFileBERT is used by MiruPlay as a Git submodule at tools/anime_parser.
AniFileBERT 在 MiruPlay 中作为 tools/anime_parser 子模块使用。
Export / 导出
From this repository root, export the published root checkpoint:
在本仓库根目录导出当前发布 checkpoint:
uv sync
uv run python -m tools.export_onnx --model-dir . --max-length 128 --android-assets-dir ../../scraper/src/main/assets/anime_parser
The exporter writes:
导出器会写入:
exports/anime_filename_parser.onnxexports/anime_filename_parser.metadata.jsonscraper/src/main/assets/anime_parser/anime_filename_parser.onnxscraper/src/main/assets/anime_parser/vocab.jsonscraper/src/main/assets/anime_parser/config.json
Static Graph Shape / 静态图 Shape
input_ids int64[1,128]
attention_mask int64[1,128]
logits float32[1,128,15]
The current export is verified against PyTorch, with max absolute logits
difference recorded in exports/anime_filename_parser.metadata.json.
当前导出会和 PyTorch 做数值对齐,最大 logits 误差记录在
exports/anime_filename_parser.metadata.json。
Local ONNX Smoke Test / 本地 ONNX 冒烟测试
uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4"
Expected fields / 期望字段:
title=神印王座, episode=200, group=GM-Team, resolution=1080P, source=GB
Special-code example / 特典编号示例:
uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv"
Expected fields / 期望字段:
title=Shinsekai Yori, episode=null, group=YYDM&VCB-Studio, special=NCED02
Runtime Contract / 运行时契约
The ONNX graph returns token logits only. Android must implement the same:
ONNX 图只返回 token logits。Android 必须实现同一套:
- custom character tokenizer / 自定义字符 tokenizer
- token id lookup from
vocab.json/ 使用vocab.json查 token id - fixed-length padding to 128 / padding 到固定长度 128
- constrained BIO decoding / 约束 BIO 解码
- field aggregation / 字段聚合
- thin string/number normalization / 轻量字符串和数字规范化
The Android runtime implementation lives in MiruPlay:
Android 运行时实现位于 MiruPlay:
scraper/src/main/kotlin/com/miruplay/tv/scraper/filename/AnimeFilenameParser.kt
The app exposes it through FilenameMetadataParser in core:model. During a
scan, ScanCoordinator passes that parser into VideoDirectoryClassifier.
应用通过 core:model 的 FilenameMetadataParser 暴露解析能力。扫描时,
ScanCoordinator 会把解析器传给 VideoDirectoryClassifier。
Asset Update Rule / 资产更新规则
When updating the parser, keep these files in sync:
更新解析器时,以下文件必须同步:
anime_filename_parser.onnx
vocab.json
config.json
Do not update only the ONNX file. Token ids, label ids, and max length are part of the runtime contract.
不要只更新 ONNX。token id、label id 和 max length 都是运行时契约的一部分。
More Details / 更多说明
See onnx.md for a minimal Python ONNX Runtime reference.
最小 Python ONNX Runtime 参考见 onnx.md。