AniFileBERT / docs /android.md
ModerRAS's picture
Organize parser modules and tools
8c50d16

Android Export and Runtime / Android 导出与运行时

AniFileBERT is used by MiruPlay as a Git submodule at tools/anime_parser.

AniFileBERT 在 MiruPlay 中作为 tools/anime_parser 子模块使用。

Export / 导出

From this repository root, export the published root checkpoint:

在本仓库根目录导出当前发布 checkpoint:

uv sync
uv run python -m tools.export_onnx --model-dir . --max-length 128 --android-assets-dir ../../scraper/src/main/assets/anime_parser

The exporter writes:

导出器会写入:

  • exports/anime_filename_parser.onnx
  • exports/anime_filename_parser.metadata.json
  • scraper/src/main/assets/anime_parser/anime_filename_parser.onnx
  • scraper/src/main/assets/anime_parser/vocab.json
  • scraper/src/main/assets/anime_parser/config.json

Static Graph Shape / 静态图 Shape

input_ids      int64[1,128]
attention_mask int64[1,128]
logits         float32[1,128,15]

The current export is verified against PyTorch, with max absolute logits difference recorded in exports/anime_filename_parser.metadata.json.

当前导出会和 PyTorch 做数值对齐,最大 logits 误差记录在 exports/anime_filename_parser.metadata.json

Local ONNX Smoke Test / 本地 ONNX 冒烟测试

uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4"

Expected fields / 期望字段:

title=神印王座, episode=200, group=GM-Team, resolution=1080P, source=GB

Special-code example / 特典编号示例:

uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv"

Expected fields / 期望字段:

title=Shinsekai Yori, episode=null, group=YYDM&VCB-Studio, special=NCED02

Runtime Contract / 运行时契约

The ONNX graph returns token logits only. Android must implement the same:

ONNX 图只返回 token logits。Android 必须实现同一套:

  • custom character tokenizer / 自定义字符 tokenizer
  • token id lookup from vocab.json / 使用 vocab.json 查 token id
  • fixed-length padding to 128 / padding 到固定长度 128
  • constrained BIO decoding / 约束 BIO 解码
  • field aggregation / 字段聚合
  • thin string/number normalization / 轻量字符串和数字规范化

The Android runtime implementation lives in MiruPlay:

Android 运行时实现位于 MiruPlay:

scraper/src/main/kotlin/com/miruplay/tv/scraper/filename/AnimeFilenameParser.kt

The app exposes it through FilenameMetadataParser in core:model. During a scan, ScanCoordinator passes that parser into VideoDirectoryClassifier.

应用通过 core:modelFilenameMetadataParser 暴露解析能力。扫描时, ScanCoordinator 会把解析器传给 VideoDirectoryClassifier

Asset Update Rule / 资产更新规则

When updating the parser, keep these files in sync:

更新解析器时,以下文件必须同步:

anime_filename_parser.onnx
vocab.json
config.json

Do not update only the ONNX file. Token ids, label ids, and max length are part of the runtime contract.

不要只更新 ONNX。token id、label id 和 max length 都是运行时契约的一部分。

More Details / 更多说明

See onnx.md for a minimal Python ONNX Runtime reference.

最小 Python ONNX Runtime 参考见 onnx.md