AniFileBERT / docs /android.md
ModerRAS's picture
Organize parser modules and tools
8c50d16
# Android Export and Runtime / Android 导出与运行时
AniFileBERT is used by MiruPlay as a Git submodule at `tools/anime_parser`.
AniFileBERT 在 MiruPlay 中作为 `tools/anime_parser` 子模块使用。
## Export / 导出
From this repository root, export the published root checkpoint:
在本仓库根目录导出当前发布 checkpoint:
```powershell
uv sync
uv run python -m tools.export_onnx --model-dir . --max-length 128 --android-assets-dir ../../scraper/src/main/assets/anime_parser
```
The exporter writes:
导出器会写入:
- `exports/anime_filename_parser.onnx`
- `exports/anime_filename_parser.metadata.json`
- `scraper/src/main/assets/anime_parser/anime_filename_parser.onnx`
- `scraper/src/main/assets/anime_parser/vocab.json`
- `scraper/src/main/assets/anime_parser/config.json`
## Static Graph Shape / 静态图 Shape
```text
input_ids int64[1,128]
attention_mask int64[1,128]
logits float32[1,128,15]
```
The current export is verified against PyTorch, with max absolute logits
difference recorded in `exports/anime_filename_parser.metadata.json`.
当前导出会和 PyTorch 做数值对齐,最大 logits 误差记录在
`exports/anime_filename_parser.metadata.json`
## Local ONNX Smoke Test / 本地 ONNX 冒烟测试
```powershell
uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4"
```
Expected fields / 期望字段:
```text
title=神印王座, episode=200, group=GM-Team, resolution=1080P, source=GB
```
Special-code example / 特典编号示例:
```powershell
uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv"
```
Expected fields / 期望字段:
```text
title=Shinsekai Yori, episode=null, group=YYDM&VCB-Studio, special=NCED02
```
## Runtime Contract / 运行时契约
The ONNX graph returns token logits only. Android must implement the same:
ONNX 图只返回 token logits。Android 必须实现同一套:
- custom character tokenizer / 自定义字符 tokenizer
- token id lookup from `vocab.json` / 使用 `vocab.json` 查 token id
- fixed-length padding to 128 / padding 到固定长度 128
- constrained BIO decoding / 约束 BIO 解码
- field aggregation / 字段聚合
- thin string/number normalization / 轻量字符串和数字规范化
The Android runtime implementation lives in MiruPlay:
Android 运行时实现位于 MiruPlay:
```text
scraper/src/main/kotlin/com/miruplay/tv/scraper/filename/AnimeFilenameParser.kt
```
The app exposes it through `FilenameMetadataParser` in `core:model`. During a
scan, `ScanCoordinator` passes that parser into `VideoDirectoryClassifier`.
应用通过 `core:model``FilenameMetadataParser` 暴露解析能力。扫描时,
`ScanCoordinator` 会把解析器传给 `VideoDirectoryClassifier`
## Asset Update Rule / 资产更新规则
When updating the parser, keep these files in sync:
更新解析器时,以下文件必须同步:
```text
anime_filename_parser.onnx
vocab.json
config.json
```
Do not update only the ONNX file. Token ids, label ids, and max length are part
of the runtime contract.
不要只更新 ONNX。token id、label id 和 max length 都是运行时契约的一部分。
## More Details / 更多说明
See [`onnx.md`](onnx.md) for a minimal Python ONNX Runtime reference.
最小 Python ONNX Runtime 参考见 [`onnx.md`](onnx.md)。