File size: 3,446 Bytes

376db19
be5f706
376db19
be5f706
376db19
be5f706
376db19
be5f706
376db19
be5f706
376db19
 
 
 
8c50d16
be5f706
 
 
 
376db19
 
be5f706
 
 
 
 
 
376db19
 
 
 
 
 
 
be5f706
376db19
 
be5f706
376db19
 
be5f706
376db19
be5f706
376db19
8c50d16
376db19
be5f706
376db19
be5f706
376db19
 
 
 
 
 
 
8c50d16
376db19
 
 
 
 
 
 
be5f706
376db19
be5f706
376db19
 
 
 
 
 
 
 
 
8c50d16
376db19
 
 
 
 
 
 
be5f706
 
376db19
 
 
 
 
 
 
 
 
 
 
be5f706
 
376db19
 
 
be5f706
376db19
 
 
 
 
 
 
 
8c50d16
 
 
376db19

# Android Export and Runtime / Android 导出与运行时

AniFileBERT is used by MiruPlay as a Git submodule at `tools/anime_parser`.

AniFileBERT 在 MiruPlay 中作为 `tools/anime_parser` 子模块使用。

## Export / 导出

From this repository root, export the published root checkpoint:

在本仓库根目录导出当前发布 checkpoint：

```powershell
uv sync
uv run python -m tools.export_onnx --model-dir . --max-length 128 --android-assets-dir ../../scraper/src/main/assets/anime_parser
```

The exporter writes:

导出器会写入：

- `exports/anime_filename_parser.onnx`
- `exports/anime_filename_parser.metadata.json`
- `scraper/src/main/assets/anime_parser/anime_filename_parser.onnx`
- `scraper/src/main/assets/anime_parser/vocab.json`
- `scraper/src/main/assets/anime_parser/config.json`

## Static Graph Shape / 静态图 Shape

```text
input_ids      int64[1,128]
attention_mask int64[1,128]
logits         float32[1,128,15]
```

The current export is verified against PyTorch, with max absolute logits
difference recorded in `exports/anime_filename_parser.metadata.json`.

当前导出会和 PyTorch 做数值对齐，最大 logits 误差记录在
`exports/anime_filename_parser.metadata.json`。

## Local ONNX Smoke Test / 本地 ONNX 冒烟测试

```powershell
uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4"
```

Expected fields / 期望字段：

```text
title=神印王座, episode=200, group=GM-Team, resolution=1080P, source=GB
```

Special-code example / 特典编号示例：

```powershell
uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv"
```

Expected fields / 期望字段：

```text
title=Shinsekai Yori, episode=null, group=YYDM&VCB-Studio, special=NCED02
```

## Runtime Contract / 运行时契约

The ONNX graph returns token logits only. Android must implement the same:

ONNX 图只返回 token logits。Android 必须实现同一套：

- custom character tokenizer / 自定义字符 tokenizer
- token id lookup from `vocab.json` / 使用 `vocab.json` 查 token id
- fixed-length padding to 128 / padding 到固定长度 128
- constrained BIO decoding / 约束 BIO 解码
- field aggregation / 字段聚合
- thin string/number normalization / 轻量字符串和数字规范化

The Android runtime implementation lives in MiruPlay:

Android 运行时实现位于 MiruPlay：

```text
scraper/src/main/kotlin/com/miruplay/tv/scraper/filename/AnimeFilenameParser.kt
```

The app exposes it through `FilenameMetadataParser` in `core:model`. During a
scan, `ScanCoordinator` passes that parser into `VideoDirectoryClassifier`.

应用通过 `core:model` 的 `FilenameMetadataParser` 暴露解析能力。扫描时，
`ScanCoordinator` 会把解析器传给 `VideoDirectoryClassifier`。

## Asset Update Rule / 资产更新规则

When updating the parser, keep these files in sync:

更新解析器时，以下文件必须同步：

```text
anime_filename_parser.onnx
vocab.json
config.json
```

Do not update only the ONNX file. Token ids, label ids, and max length are part
of the runtime contract.

不要只更新 ONNX。token id、label id 和 max length 都是运行时契约的一部分。

## More Details / 更多说明

See [`onnx.md`](onnx.md) for a minimal Python ONNX Runtime reference.

最小 Python ONNX Runtime 参考见 [`onnx.md`](onnx.md)。