AniFileBERT / docs /onnx.md

Train virtual-shard anime parser

359ff82 about 3 hours ago

preview code

raw

history blame contribute delete

5.89 kB

ONNX Usage / ONNX 使用说明

AniFileBERT exports a static-shape ONNX graph for Android and local inference.

AniFileBERT 导出静态 shape 的 ONNX 图，用于 Android 和本地推理。

1. What ONNX Contains / ONNX 包含什么

The ONNX graph contains only the BERT token-classification forward pass:

ONNX 图只包含 BERT token-classification 前向计算：

input_ids      int64[1,128]
attention_mask int64[1,128]
logits         float32[1,128,15]

It does not contain:

它不包含：

filename tokenization / 文件名分词
token-to-id conversion / token 到 id 的转换
constrained BIO decoding / 约束 BIO 解码
field aggregation / 字段聚合
thin string and number normalization / 薄字符串和数字规范化

Those steps must stay aligned with anifilebert/tokenizer.py, anifilebert/inference.py, config.json, and vocab.json.

这些步骤必须与 anifilebert/tokenizer.py、anifilebert/inference.py、config.json、vocab.json 保持一致。

2. Export / 导出

uv run python -m tools.export_onnx --model-dir . --output exports/anime_filename_parser.onnx --max-length 128

The exporter also writes:

导出器还会写入：

exports/anime_filename_parser.metadata.json

The metadata records the sample filename, output shape, and PyTorch/ONNX max absolute logits difference.

metadata 会记录样本文件名、输出 shape、PyTorch/ONNX logits 最大绝对误差。

3. Local ONNX Inference / 本地 ONNX 推理

Use python -m tools.onnx_inference as the minimal runnable reference.

使用 python -m tools.onnx_inference 作为最小可运行参考实现。

uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4"

Expected:

期望输出：

{"title":"神印王座","season":null,"episode":200,"group":"GM-Team","resolution":"1080P","source":"GB","special":null}

Special-code example:

特典编号示例：

uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv"

Expected:

期望输出：

{"title":"Shinsekai Yori","season":null,"episode":null,"group":"YYDM&VCB-Studio","resolution":"1080p","source":"x265_flac","special":"NCED02"}

4. Implementation Steps / 实现步骤

The runtime parser should do this:

运行时解析器应按以下步骤实现：

Tokenize filename with the custom character tokenizer. 使用自定义字符 tokenizer 对文件名分词。
Add [CLS] and [SEP], truncate to max_length - 2. 添加 [CLS] 和 [SEP]，截断到 max_length - 2。
Convert tokens to ids with vocab.json. 使用 vocab.json 转换 token id。
Pad input_ids and attention_mask to exactly 128. 将 input_ids 和 attention_mask padding 到固定 128。
Run ONNX Runtime. 执行 ONNX Runtime。
Slice logits back to real token count, excluding [CLS] and [SEP]. 去掉 [CLS] / [SEP]，只保留真实 token 的 logits。
Decode labels with constrained BIO transitions. 使用约束 BIO transition 解码标签。
Aggregate labels into parser fields. 聚合标签为结构化字段。
Apply thin normalization only: trim brackets, normalize source text, and convert numeric fields. 只做薄层规范化：裁剪括号/扩展名并转换数字字段。

The ONNX reference runtime intentionally matches the Python thin runtime. It does not include structural filename regex assists.

ONNX 参考运行时有意与 Python 薄层运行时保持一致，不包含结构化文件名正则辅助。

5. Android Notes / Android 注意事项

Android must bundle these files together:

Android 端必须同时打包：

anime_filename_parser.onnx
vocab.json
config.json

When changing any of them, update all of them in the same commit.

只要其中任意一个变化，三者必须在同一次提交中一起更新。

6. Common Mistakes / 常见错误

Using a standard Hugging Face tokenizer

误用标准 Hugging Face tokenizer

This model uses AnimeTokenizer, not WordPiece/BPE.

本模型使用 AnimeTokenizer，不是 WordPiece/BPE。

Treating ONNX output as final fields

把 ONNX 输出当成最终字段

ONNX returns token logits. You still need BIO decode and field aggregation.

ONNX 返回 token logits，仍然需要 BIO 解码和字段聚合。

Changing max length without updating Android

改 max length 但没有同步 Android

The exported graph is static. Runtime arrays must match [1,128].

导出的图是静态 shape，运行时数组必须匹配 [1,128]。

7. Benchmark / 性能基准

Run:

运行：

uv run python -m tools.benchmark_inference --model-dir . --onnx exports/anime_filename_parser.onnx --case-file data/parser_regression_cases.json --repeat 20 --warmup 20 --torch-threads 1 --ort-threads 1 --output reports/benchmark_results.json

Local single-thread CPU result, measured on 26 real-world regression cases with the default thin runtime:

本地 CPU 单线程结果，使用 26 条真实回归 case 和默认薄层运行时：

Backend / 后端	Load ms / 加载 ms	Avg ms / 平均 ms	P50 ms	P95 ms	P99 ms	files/s
PyTorch	46.35	15.36	14.25	22.27	29.75	65.1
ONNX Runtime	50.92	12.04	11.90	13.81	15.38	83.1

The benchmark includes tokenization, model/session forward, constrained BIO decode, entity aggregation, and thin normalization. It does not include repeatedly constructing the ONNX Runtime session inside the loop.

该基准包含 tokenizer、模型/session 前向、约束 BIO 解码、实体聚合和薄层规范化；循环内不会重复创建 ONNX Runtime session。