Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
File size: 3,446 Bytes
376db19 be5f706 376db19 be5f706 376db19 be5f706 376db19 be5f706 376db19 be5f706 376db19 8c50d16 be5f706 376db19 be5f706 376db19 be5f706 376db19 be5f706 376db19 be5f706 376db19 be5f706 376db19 8c50d16 376db19 be5f706 376db19 be5f706 376db19 8c50d16 376db19 be5f706 376db19 be5f706 376db19 8c50d16 376db19 be5f706 376db19 be5f706 376db19 be5f706 376db19 8c50d16 376db19 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | # Android Export and Runtime / Android 导出与运行时
AniFileBERT is used by MiruPlay as a Git submodule at `tools/anime_parser`.
AniFileBERT 在 MiruPlay 中作为 `tools/anime_parser` 子模块使用。
## Export / 导出
From this repository root, export the published root checkpoint:
在本仓库根目录导出当前发布 checkpoint:
```powershell
uv sync
uv run python -m tools.export_onnx --model-dir . --max-length 128 --android-assets-dir ../../scraper/src/main/assets/anime_parser
```
The exporter writes:
导出器会写入:
- `exports/anime_filename_parser.onnx`
- `exports/anime_filename_parser.metadata.json`
- `scraper/src/main/assets/anime_parser/anime_filename_parser.onnx`
- `scraper/src/main/assets/anime_parser/vocab.json`
- `scraper/src/main/assets/anime_parser/config.json`
## Static Graph Shape / 静态图 Shape
```text
input_ids int64[1,128]
attention_mask int64[1,128]
logits float32[1,128,15]
```
The current export is verified against PyTorch, with max absolute logits
difference recorded in `exports/anime_filename_parser.metadata.json`.
当前导出会和 PyTorch 做数值对齐,最大 logits 误差记录在
`exports/anime_filename_parser.metadata.json`。
## Local ONNX Smoke Test / 本地 ONNX 冒烟测试
```powershell
uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4"
```
Expected fields / 期望字段:
```text
title=神印王座, episode=200, group=GM-Team, resolution=1080P, source=GB
```
Special-code example / 特典编号示例:
```powershell
uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv"
```
Expected fields / 期望字段:
```text
title=Shinsekai Yori, episode=null, group=YYDM&VCB-Studio, special=NCED02
```
## Runtime Contract / 运行时契约
The ONNX graph returns token logits only. Android must implement the same:
ONNX 图只返回 token logits。Android 必须实现同一套:
- custom character tokenizer / 自定义字符 tokenizer
- token id lookup from `vocab.json` / 使用 `vocab.json` 查 token id
- fixed-length padding to 128 / padding 到固定长度 128
- constrained BIO decoding / 约束 BIO 解码
- field aggregation / 字段聚合
- thin string/number normalization / 轻量字符串和数字规范化
The Android runtime implementation lives in MiruPlay:
Android 运行时实现位于 MiruPlay:
```text
scraper/src/main/kotlin/com/miruplay/tv/scraper/filename/AnimeFilenameParser.kt
```
The app exposes it through `FilenameMetadataParser` in `core:model`. During a
scan, `ScanCoordinator` passes that parser into `VideoDirectoryClassifier`.
应用通过 `core:model` 的 `FilenameMetadataParser` 暴露解析能力。扫描时,
`ScanCoordinator` 会把解析器传给 `VideoDirectoryClassifier`。
## Asset Update Rule / 资产更新规则
When updating the parser, keep these files in sync:
更新解析器时,以下文件必须同步:
```text
anime_filename_parser.onnx
vocab.json
config.json
```
Do not update only the ONNX file. Token ids, label ids, and max length are part
of the runtime contract.
不要只更新 ONNX。token id、label id 和 max length 都是运行时契约的一部分。
## More Details / 更多说明
See [`onnx.md`](onnx.md) for a minimal Python ONNX Runtime reference.
最小 Python ONNX Runtime 参考见 [`onnx.md`](onnx.md)。
|