Remove structural parser rule assists

Browse files

Files changed (13) hide show

MAINTENANCE.md +4 -3
README.md +5 -5
benchmark_inference.py +1 -5
benchmark_results.json +16 -17
case_metrics.json +0 -569
diagnose_pipeline.py +8 -53
docs/onnx.md +5 -6
docs/training.md +2 -2
evaluate_parser_cases.py +4 -18
inference.py +10 -661
onnx_inference.py +3 -7
parse_eval_metrics.json +1 -624
train.py +3 -8

MAINTENANCE.md CHANGED Viewed

@@ -121,11 +121,12 @@ uv run python benchmark_inference.py --model-dir . --onnx exports/anime_filename
 ```
 The default parser path is thin runtime: model logits, constrained BIO, entity
-aggregation, and light string/number normalization. `--rule-assist` is a
-compatibility/diagnostic mode only; do not use it as the primary quality metric.
 默认解析路径是薄层运行时：模型 logits、约束 BIO、实体聚合和轻量字符串/数字规范化。
-`--rule-assist` 只是兼容/诊断模式，不作为主质量指标。
 ## Dataset Submodule / 数据集子模块

 ```
 The default parser path is thin runtime: model logits, constrained BIO, entity
+aggregation, and light string/number normalization. Do not add structural
+filename regex assists back to the default runtime; parser quality should come
+from labels and model training.
 默认解析路径是薄层运行时：模型 logits、约束 BIO、实体聚合和轻量字符串/数字规范化。
+不要把结构化文件名正则辅助重新加回默认运行时；解析质量应来自标签和模型训练。
 ## Dataset Submodule / 数据集子模块

README.md CHANGED Viewed

@@ -146,11 +146,11 @@ Current published checkpoint:
 | Focus held-out, default thin runtime / 困难抽样，默认薄层运行时 | 1017/1024 full match = `99.32%` |
 | Token/entity eval / token/entity 评估 | F1 `0.9972`, token accuracy `0.9995` |
 | ONNX parity / ONNX 误差 | max abs diff `4.0531e-05` |
-| CPU thin-runtime latency / CPU 薄层运行时延迟 | ONNX avg `13.08 ms`, P95 `15.95 ms` |
-**中文**：当前发布模型是“两阶段训练”产物：先在 `datasets/AnimeName/dmhy_weak_char.jsonl` 上全量 CUDA 重训，再做 thin hard-case focus 微调。细节见 `training_lineage.json`。README 主指标以 `model-only` 和默认薄层 `normalized-only` 为准；`--rule-assist` 只保留为兼容/诊断对照，不再作为模型质量标准。
-**English**: The published checkpoint was trained in two stages: a full CUDA fine-tune on `datasets/AnimeName/dmhy_weak_char.jsonl`, followed by a thin hard-case focus fine-tune. See `training_lineage.json` for details. README quality numbers prioritize `model-only` and the default thin `normalized-only` runtime; `--rule-assist` is retained only for compatibility/diagnostics.
 Run regression:
@@ -177,8 +177,8 @@ decoding, entity aggregation, and light string/number normalization:
 | Backend / 后端 | Load ms / 加载 ms | Avg ms / 平均 ms | P50 ms | P95 ms | P99 ms | files/s |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: |
-| PyTorch | 49.07 | 15.16 | 14.87 | 18.50 | 21.91 | 66.0 |
-| ONNX Runtime | 568.85 | 13.08 | 12.82 | 15.95 | 20.19 | 76.5 |
 **中文**：这是完整薄层 parser 的端到端延迟，不是只测模型 forward。移动端实现应复用 ONNX session，并保持 tokenizer/BIO/薄规范化逻辑一致。

 | Focus held-out, default thin runtime / 困难抽样，默认薄层运行时 | 1017/1024 full match = `99.32%` |
 | Token/entity eval / token/entity 评估 | F1 `0.9972`, token accuracy `0.9995` |
 | ONNX parity / ONNX 误差 | max abs diff `4.0531e-05` |
+| CPU thin-runtime latency / CPU 薄层运行时延迟 | ONNX avg `13.18 ms`, P95 `16.70 ms` |
+**中文**：当前发布模型是“两阶段训练”产物：先在 `datasets/AnimeName/dmhy_weak_char.jsonl` 上全量 CUDA 重训，再做 thin hard-case focus 微调。细节见 `training_lineage.json`。README 主指标以 `model-only` 和默认薄层 `normalized-only` 为准；旧版结构规则辅助层已移除，不再作为运行时或质量对照。
+**English**: The published checkpoint was trained in two stages: a full CUDA fine-tune on `datasets/AnimeName/dmhy_weak_char.jsonl`, followed by a thin hard-case focus fine-tune. See `training_lineage.json` for details. README quality numbers prioritize `model-only` and the default thin `normalized-only` runtime; structural filename assists have been removed from the runtime and quality reports.
 Run regression:
 | Backend / 后端 | Load ms / 加载 ms | Avg ms / 平均 ms | P50 ms | P95 ms | P99 ms | files/s |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: |
+| PyTorch | 76.56 | 16.85 | 16.21 | 22.84 | 28.31 | 59.4 |
+| ONNX Runtime | 49.74 | 13.18 | 12.86 | 16.70 | 18.06 | 75.9 |
 **中文**：这是完整薄层 parser 的端到端延迟，不是只测模型 forward。移动端实现应复用 ONNX session，并保持 tokenizer/BIO/薄规范化逻辑一致。

benchmark_inference.py CHANGED Viewed

@@ -95,8 +95,6 @@ def main() -> None:
     parser.add_argument("--torch-threads", type=int, default=1, help="torch intra-op thread count")
     parser.add_argument("--ort-threads", type=int, default=1, help="ONNX Runtime intra/inter-op thread count")
     parser.add_argument("--no-constrained-bio", action="store_true", help="Use greedy labels for PyTorch backend")
-    parser.add_argument("--rule-assist", action="store_true", help="Enable legacy structural postprocessing")
-    parser.add_argument("--no-rule-assist", action="store_true", help=argparse.SUPPRESS)
     parser.add_argument("--output", default=None, help="Optional JSON output path")
     args = parser.parse_args()
@@ -128,7 +126,6 @@ def main() -> None:
                 id2label,
                 max_length=resolved_max_length,
                 debug=False,
-                use_rules=args.rule_assist and not args.no_rule_assist,
                 constrain_bio=not args.no_constrained_bio,
             )
@@ -150,7 +147,7 @@ def main() -> None:
         load_ms = (time.perf_counter() - load_start) * 1000.0
         def parse_onnx(filename: str) -> Dict:
-            return onnx_parser.parse(filename, use_rules=args.rule_assist and not args.no_rule_assist)
         raw = run_benchmark("onnxruntime", parse_onnx, filenames, args.warmup, args.repeat)
         results.append(summarize(raw["name"], load_ms, raw["latencies_ms"]))
@@ -164,7 +161,6 @@ def main() -> None:
         "warmup": args.warmup,
         "torch_threads": args.torch_threads,
         "ort_threads": args.ort_threads,
-        "use_rules": args.rule_assist and not args.no_rule_assist,
         "constrain_bio": not args.no_constrained_bio,
         "results": results,
     }

     parser.add_argument("--torch-threads", type=int, default=1, help="torch intra-op thread count")
     parser.add_argument("--ort-threads", type=int, default=1, help="ONNX Runtime intra/inter-op thread count")
     parser.add_argument("--no-constrained-bio", action="store_true", help="Use greedy labels for PyTorch backend")
     parser.add_argument("--output", default=None, help="Optional JSON output path")
     args = parser.parse_args()
                 id2label,
                 max_length=resolved_max_length,
                 debug=False,
                 constrain_bio=not args.no_constrained_bio,
             )
         load_ms = (time.perf_counter() - load_start) * 1000.0
         def parse_onnx(filename: str) -> Dict:
+            return onnx_parser.parse(filename)
         raw = run_benchmark("onnxruntime", parse_onnx, filenames, args.warmup, args.repeat)
         results.append(summarize(raw["name"], load_ms, raw["latencies_ms"]))
         "warmup": args.warmup,
         "torch_threads": args.torch_threads,
         "ort_threads": args.ort_threads,
         "constrain_bio": not args.no_constrained_bio,
         "results": results,
     }

benchmark_results.json CHANGED Viewed

@@ -7,32 +7,31 @@
   "warmup": 20,
   "torch_threads": 1,
   "ort_threads": 1,
-  "use_rules": false,
   "constrain_bio": true,
   "results": [
     {
       "name": "pytorch",
-      "load_ms": 49.07089995685965,
       "runs": 520,
-      "avg_ms": 15.156135000646687,
-      "p50_ms": 14.874850050546229,
-      "p95_ms": 18.50034496746957,
-      "p99_ms": 21.91202303394671,
-      "min_ms": 11.207600007764995,
-      "max_ms": 26.899200049228966,
-      "throughput_fps": 65.97988207134152
     },
     {
       "name": "onnxruntime",
-      "load_ms": 568.8452000031248,
       "runs": 520,
-      "avg_ms": 13.076459232475967,
-      "p50_ms": 12.81869993545115,
-      "p95_ms": 15.947990084532643,
-      "p99_ms": 20.187044028425575,
-      "min_ms": 10.0586999906227,
-      "max_ms": 22.88920001592487,
-      "throughput_fps": 76.4733007782761
     }
   ]
 }

   "warmup": 20,
   "torch_threads": 1,
   "ort_threads": 1,
   "constrain_bio": true,
   "results": [
     {
       "name": "pytorch",
+      "load_ms": 76.55749993864447,
       "runs": 520,
+      "avg_ms": 16.846879808312785,
+      "p50_ms": 16.207700013183057,
+      "p95_ms": 22.843200032366425,
+      "p99_ms": 28.308318012859665,
+      "min_ms": 11.152399936690927,
+      "max_ms": 34.10990000702441,
+      "throughput_fps": 59.35817263363916
     },
     {
       "name": "onnxruntime",
+      "load_ms": 49.74160005804151,
       "runs": 520,
+      "avg_ms": 13.178169615835381,
+      "p50_ms": 12.862899922765791,
+      "p95_ms": 16.696884995326396,
+      "p99_ms": 18.06362595874816,
+      "min_ms": 9.811799973249435,
+      "max_ms": 20.784800057299435,
+      "throughput_fps": 75.88307247148819
     }
   ]
 }

case_metrics.json CHANGED Viewed

@@ -6,7 +6,6 @@
       "case_file": "data/parser_regression_cases.json",
       "tokenizer_variant": "char",
       "max_length": 128,
-      "use_rules": false,
       "constrain_bio": false,
       "case_count": 26,
       "full_correct": 25,
@@ -606,574 +605,6 @@
       "case_file": "data/parser_regression_cases.json",
       "tokenizer_variant": "char",
       "max_length": 128,
-      "use_rules": false,
-      "constrain_bio": true,
-      "case_count": 26,
-      "full_correct": 26,
-      "full_accuracy": 1.0,
-      "field_correct": {
-        "group": 22,
-        "title": 26,
-        "episode": 26,
-        "resolution": 26,
-        "source": 19,
-        "season": 9,
-        "special": 5
-      },
-      "field_total": {
-        "group": 22,
-        "title": 26,
-        "episode": 26,
-        "resolution": 26,
-        "source": 19,
-        "season": 9,
-        "special": 5
-      },
-      "field_accuracy": {
-        "episode": 1.0,
-        "group": 1.0,
-        "resolution": 1.0,
-        "season": 1.0,
-        "source": 1.0,
-        "special": 1.0,
-        "title": 1.0
-      },
-      "failures": [],
-      "results": [
-        {
-          "id": "lolihouse_dash_episode",
-          "filename": "[LoliHouse] Yomi no Tsugai - 07 [WebRip 1080p HEVC-10bit AAC ASSx2]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "LoliHouse",
-            "title": "Yomi no Tsugai",
-            "episode": 7,
-            "resolution": "1080p",
-            "source": "WebRip"
-          },
-          "pred": {
-            "episode": 7,
-            "group": "LoliHouse",
-            "resolution": "1080p",
-            "source": "WebRip",
-            "title": "Yomi no Tsugai"
-          }
-        },
-        {
-          "id": "dot_season_episode_no_group",
-          "filename": "Witch.Hat.Atelier.S01E07.1080p.NF.WEB-DL.JPN.AAC2.0.H.264.MSubs-ToonsHub",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "title": "Witch.Hat.Atelier",
-            "season": 1,
-            "episode": 7,
-            "group": null,
-            "resolution": "1080p",
-            "source": "NF"
-          },
-          "pred": {
-            "episode": 7,
-            "group": null,
-            "resolution": "1080p",
-            "season": 1,
-            "source": "NF",
-            "title": "Witch.Hat.Atelier"
-          }
-        },
-        {
-          "id": "ani_cjk_season_dash_episode",
-          "filename": "[ANi] 異世界悠閒農家 2 - 06 [1080P][Baha][WEB-DL][AAC AVC][CHT]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "ANi",
-            "title": "異世界悠閒農家",
-            "season": 2,
-            "episode": 6,
-            "resolution": "1080P",
-            "source": "Baha"
-          },
-          "pred": {
-            "episode": 6,
-            "group": "ANi",
-            "resolution": "1080P",
-            "season": 2,
-            "source": "Baha",
-            "title": "異世界悠閒農家"
-          }
-        },
-        {
-          "id": "kisssub_bracket_title_episode",
-          "filename": "[KissSub][Shunkashuutou Daikousha - Haru no Mai][05][1080P][GB][MP4]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "KissSub",
-            "title": "Shunkashuutou Daikousha - Haru no Mai",
-            "episode": 5,
-            "resolution": "1080P",
-            "source": "GB"
-          },
-          "pred": {
-            "episode": 5,
-            "group": "KissSub",
-            "resolution": "1080P",
-            "source": "GB",
-            "title": "Shunkashuutou Daikousha - Haru no Mai"
-          }
-        },
-        {
-          "id": "airotabracket_title_episode",
-          "filename": "[Airota][Sousou no Frieren][29][1080p AVC AAC][CHT]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "Airota",
-            "title": "Sousou no Frieren",
-            "episode": 29,
-            "resolution": "1080p",
-            "source": "CHT"
-          },
-          "pred": {
-            "episode": 29,
-            "group": "Airota",
-            "resolution": "1080p",
-            "source": "CHT",
-            "title": "Sousou no Frieren"
-          }
-        },
-        {
-          "id": "subsplease_parenthesized_resolution",
-          "filename": "[SubsPlease] Mushoku Tensei - 12 (1080p) [x265][AAC]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "SubsPlease",
-            "title": "Mushoku Tensei",
-            "episode": 12,
-            "resolution": "1080p"
-          },
-          "pred": {
-            "episode": 12,
-            "group": "SubsPlease",
-            "resolution": "1080p",
-            "title": "Mushoku Tensei"
-          }
-        },
-        {
-          "id": "vcb_bracket_episode",
-          "filename": "[VCB-Studio] Girls Band Cry [01][Ma10p_1080p][x265_flac]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "VCB-Studio",
-            "title": "Girls Band Cry",
-            "episode": 1,
-            "resolution": "1080p"
-          },
-          "pred": {
-            "episode": 1,
-            "group": "VCB-Studio",
-            "resolution": "1080p",
-            "title": "Girls Band Cry"
-          }
-        },
-        {
-          "id": "numeric_title_not_episode",
-          "filename": "86 Eighty Six - 01 [1080P][Baha]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "title": "86 Eighty Six",
-            "episode": 1,
-            "resolution": "1080P",
-            "source": "Baha"
-          },
-          "pred": {
-            "episode": 1,
-            "resolution": "1080P",
-            "source": "Baha",
-            "title": "86 Eighty Six"
-          }
-        },
-        {
-          "id": "erai_raws_dash_episode",
-          "filename": "[Erai-raws] Sousou no Frieren - 01 [1080p][Multiple Subtitle][ENG]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "Erai-raws",
-            "title": "Sousou no Frieren",
-            "episode": 1,
-            "resolution": "1080p"
-          },
-          "pred": {
-            "episode": 1,
-            "group": "Erai-raws",
-            "resolution": "1080p",
-            "title": "Sousou no Frieren"
-          }
-        },
-        {
-          "id": "nekomoe_space_group",
-          "filename": "[Nekomoe kissaten][Watashi no Shiawase na Kekkon][01][1080p][JPSC]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "Nekomoe kissaten",
-            "title": "Watashi no Shiawase na Kekkon",
-            "episode": 1,
-            "resolution": "1080p"
-          },
-          "pred": {
-            "episode": 1,
-            "group": "Nekomoe kissaten",
-            "resolution": "1080p",
-            "title": "Watashi no Shiawase na Kekkon"
-          }
-        },
-        {
-          "id": "long_running_episode",
-          "filename": "One.Piece.1110.1080p.WEB-DL.AAC2.0.H.264",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "title": "One.Piece",
-            "episode": 1110,
-            "resolution": "1080p",
-            "source": "WEB-DL"
-          },
-          "pred": {
-            "episode": 1110,
-            "resolution": "1080p",
-            "source": "WEB-DL",
-            "title": "One.Piece"
-          }
-        },
-        {
-          "id": "season_episode_amzn",
-          "filename": "Example.Show.S02E03.2160p.AMZN.WEB-DL.DDP5.1.H.265",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "title": "Example.Show",
-            "season": 2,
-            "episode": 3,
-            "resolution": "2160p",
-            "source": "AMZN"
-          },
-          "pred": {
-            "episode": 3,
-            "resolution": "2160p",
-            "season": 2,
-            "source": "AMZN",
-            "title": "Example.Show"
-          }
-        },
-        {
-          "id": "cjk_group_with_prefix_tag",
-          "filename": "【喵萌奶茶屋】★04月新番★[葬送的芙莉莲][01][1080P][HEVC]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "喵萌奶茶屋",
-            "title": "葬送的芙莉莲",
-            "episode": 1,
-            "resolution": "1080P"
-          },
-          "pred": {
-            "episode": 1,
-            "group": "喵萌奶茶屋",
-            "resolution": "1080P",
-            "title": "葬送的芙莉莲"
-          }
-        },
-        {
-          "id": "leading_meta_not_group",
-          "filename": "[1080p] Witch Watch - 15 [CHS]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": null,
-            "title": "Witch Watch",
-            "episode": 15,
-            "resolution": "1080p",
-            "source": "CHS"
-          },
-          "pred": {
-            "episode": 15,
-            "group": null,
-            "resolution": "1080p",
-            "source": "CHS",
-            "title": "Witch Watch"
-          }
-        },
-        {
-          "id": "sakurato_group_language_source",
-          "filename": "[Sakurato] Witch Watch - 15 [1080p][CHS]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "Sakurato",
-            "title": "Witch Watch",
-            "episode": 15,
-            "resolution": "1080p",
-            "source": "CHS"
-          },
-          "pred": {
-            "episode": 15,
-            "group": "Sakurato",
-            "resolution": "1080p",
-            "source": "CHS",
-            "title": "Witch Watch"
-          }
-        },
-        {
-          "id": "billion_meta_lab_search_special",
-          "filename": "[Billion Meta Lab] 魔法姊妹露露莉莉 Mahou no Shimai Rurutto Riryi [07][1080P][CHT&JPN][檢索：魔法姊妹露露特莉莉].mp4",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "Billion Meta Lab",
-            "title": "魔法姊妹露露莉莉 Mahou no Shimai Rurutto Riryi",
-            "episode": 7,
-            "resolution": "1080P",
-            "source": "CHT&JPN",
-            "special": "檢索：魔法姊妹露露特莉莉"
-          },
-          "pred": {
-            "episode": 7,
-            "group": "Billion Meta Lab",
-            "resolution": "1080P",
-            "source": "CHT&JPN",
-            "special": "檢索：魔法姊妹露露特莉莉",
-            "title": "魔法姊妹露露莉莉 Mahou no Shimai Rurutto Riryi"
-          }
-        },
-        {
-          "id": "studio_greentea_s2_bracket_episode",
-          "filename": "[Studio GreenTea] Otonari no Tenshi-sama ni Itsunomanika Dame Ningen ni Sareteita Ken S2 [06][WebRip][HEVC-10bit 1080p AAC][JPSC].mp4",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "Studio GreenTea",
-            "title": "Otonari no Tenshi-sama ni Itsunomanika Dame Ningen ni Sareteita Ken",
-            "season": 2,
-            "episode": 6,
-            "resolution": "1080p",
-            "source": "WebRip"
-          },
-          "pred": {
-            "episode": 6,
-            "group": "Studio GreenTea",
-            "resolution": "1080p",
-            "season": 2,
-            "source": "WebRip",
-            "title": "Otonari no Tenshi-sama ni Itsunomanika Dame Ningen ni Sareteita Ken"
-          }
-        },
-        {
-          "id": "lolihouse_kakuriyo_bare_ni_season",
-          "filename": "[LoliHouse] Kakuriyo no Yadomeshi Ni - 12 [WebRip 1080p HEVC-10bit AAC SRTx2].mkv",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "LoliHouse",
-            "title": "Kakuriyo no Yadomeshi",
-            "season": 2,
-            "episode": 12,
-            "resolution": "1080p",
-            "source": "WebRip"
-          },
-          "pred": {
-            "episode": 12,
-            "group": "LoliHouse",
-            "resolution": "1080p",
-            "season": 2,
-            "source": "WebRip",
-            "title": "Kakuriyo no Yadomeshi"
-          }
-        },
-        {
-          "id": "ani_kakuriyo_traditional_ni",
-          "filename": "[ANi] 妖怪旅館營業中 貳 - 11 [1080P][Baha][WEB-DL][AAC AVC][CHT].mp4",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "ANi",
-            "title": "妖怪旅館營業中",
-            "season": 2,
-            "episode": 11,
-            "resolution": "1080P",
-            "source": "Baha"
-          },
-          "pred": {
-            "episode": 11,
-            "group": "ANi",
-            "resolution": "1080P",
-            "season": 2,
-            "source": "Baha",
-            "title": "妖怪旅館營業中"
-          }
-        },
-        {
-          "id": "jibaketa_shokugeki_ni_no_sara",
-          "filename": "[jibaketa]Shokugeki no Souma Ni no Sara - 13 END [BD 1920x1080 x264 AACx2 SRT TVB CHT].mkv",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "jibaketa",
-            "title": "Shokugeki no Souma",
-            "season": 2,
-            "episode": 13,
-            "resolution": "1920x1080"
-          },
-          "pred": {
-            "episode": 13,
-            "group": "jibaketa",
-            "resolution": "1920x1080",
-            "season": 2,
-            "title": "Shokugeki no Souma"
-          }
-        },
-        {
-          "id": "ai_raws_fire_force_cjk_season_hash_episode",
-          "filename": "[AI-Raws] 炎炎の消防隊 弐ノ章 #13 (BD HEVC 1920x1080 yuv444p10le FLAC)[FC74A2D5].mkv",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "AI-Raws",
-            "title": "炎炎の消防隊",
-            "season": 2,
-            "episode": 13,
-            "resolution": "1920x1080"
-          },
-          "pred": {
-            "episode": 13,
-            "group": "AI-Raws",
-            "resolution": "1920x1080",
-            "season": 2,
-            "title": "炎炎の消防隊"
-          }
-        },
-        {
-          "id": "gm_team_guoman_bilingual_s2",
-          "filename": "[GM-Team][国漫][逆天邪神 第2季][Against the Gods Ⅱ][2026][04][HEVC][GB][4K].mp4",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "GM-Team",
-            "title": "逆天邪神",
-            "season": 2,
-            "episode": 4,
-            "resolution": "4K",
-            "source": "GB"
-          },
-          "pred": {
-            "episode": 4,
-            "group": "GM-Team",
-            "resolution": "4K",
-            "season": 2,
-            "source": "GB",
-            "title": "逆天邪神"
-          }
-        },
-        {
-          "id": "vcb_special_iv_not_episode",
-          "filename": "[YYDM&VCB-Studio] Shinsekai Yori [IV05][Ma10p_1080p][x265_aac].mkv",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "YYDM&VCB-Studio",
-            "title": "Shinsekai Yori",
-            "episode": null,
-            "resolution": "1080p",
-            "source": "x265_aac",
-            "special": "IV05"
-          },
-          "pred": {
-            "episode": null,
-            "group": "YYDM&VCB-Studio",
-            "resolution": "1080p",
-            "source": "x265-aac",
-            "special": "IV05",
-            "title": "Shinsekai Yori"
-          }
-        },
-        {
-          "id": "vcb_nced_not_episode",
-          "filename": "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "YYDM&VCB-Studio",
-            "title": "Shinsekai Yori",
-            "episode": null,
-            "resolution": "1080p",
-            "source": "x265_flac",
-            "special": "NCED02"
-          },
-          "pred": {
-            "episode": null,
-            "group": "YYDM&VCB-Studio",
-            "resolution": "1080p",
-            "source": "x265-flac",
-            "special": "NCED02",
-            "title": "Shinsekai Yori"
-          }
-        },
-        {
-          "id": "dot_nced_suffix_not_episode",
-          "filename": "InuYasha.2000.NCED02.BDrip.AV1.10Bit.DTS.1080p-CalChi",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "title": "InuYasha",
-            "episode": null,
-            "resolution": "1080p",
-            "source": "BDrip",
-            "special": "NCED02"
-          },
-          "pred": {
-            "episode": null,
-            "resolution": "1080p",
-            "source": "BDrip",
-            "special": "NCED02",
-            "title": "InuYasha"
-          }
-        },
-        {
-          "id": "vcb_numeric_title_nced",
-          "filename": "[VCB-Studio] Yamada-kun to 7-nin no Majo [NCED][Ma10p_1080p][x265_flac]",
-          "ok": true,
-          "errors": {},
-          "expected": {
-            "group": "VCB-Studio",
-            "title": "Yamada-kun to 7-nin no Majo",
-            "episode": null,
-            "resolution": "1080p",
-            "source": "x265_flac",
-            "special": "NCED"
-          },
-          "pred": {
-            "episode": null,
-            "group": "VCB-Studio",
-            "resolution": "1080p",
-            "source": "x265-flac",
-            "special": "NCED",
-            "title": "Yamada-kun to 7-nin no Majo"
-          }
-        }
-      ]
-    },
-    "rule_assisted": {
-      "model_dir": ".",
-      "case_file": "data/parser_regression_cases.json",
-      "tokenizer_variant": "char",
-      "max_length": 128,
-      "use_rules": true,
       "constrain_bio": true,
       "case_count": 26,
       "full_correct": 26,

       "case_file": "data/parser_regression_cases.json",
       "tokenizer_variant": "char",
       "max_length": 128,
       "constrain_bio": false,
       "case_count": 26,
       "full_correct": 25,
       "case_file": "data/parser_regression_cases.json",
       "tokenizer_variant": "char",
       "max_length": 128,
       "constrain_bio": true,
       "case_count": 26,
       "full_correct": 26,

diagnose_pipeline.py CHANGED Viewed

@@ -364,9 +364,7 @@ def evaluate_model(
     entity_confusion: Counter = Counter()
     boundary_errors: Counter = Counter()
     parse_metrics: Counter = Counter()
-    parse_metrics_no_rules: Counter = Counter()
     field_failures: List[dict] = []
-    field_failures_no_rules: List[dict] = []
     with torch.no_grad():
         for sample in eval_samples:
@@ -410,32 +408,13 @@ def evaluate_model(
                 active_tokens,
                 true_labels,
                 tokenizer=tokenizer,
-                filename=sample.get("filename"),
-                use_rules=True,
             )
             pred_parse = postprocess(
                 active_tokens,
                 pred_labels,
                 tokenizer=tokenizer,
-                filename=sample.get("filename"),
-                use_rules=True,
-            )
-            gold_parse_no_rules = postprocess(
-                active_tokens,
-                true_labels,
-                tokenizer=tokenizer,
-                filename=sample.get("filename"),
-                use_rules=False,
-            )
-            pred_parse_no_rules = postprocess(
-                active_tokens,
-                pred_labels,
-                tokenizer=tokenizer,
-                filename=sample.get("filename"),
-                use_rules=False,
             )
             update_parse_metrics(parse_metrics, gold_parse, pred_parse)
-            update_parse_metrics(parse_metrics_no_rules, gold_parse_no_rules, pred_parse_no_rules)
             failures = collect_field_failures(gold_parse, pred_parse)
             if failures and len(field_failures) < 30:
                 field_failures.append(
@@ -446,16 +425,6 @@ def evaluate_model(
                         "pred": pred_parse,
                     }
                 )
-            failures_no_rules = collect_field_failures(gold_parse_no_rules, pred_parse_no_rules)
-            if failures_no_rules and len(field_failures_no_rules) < 30:
-                field_failures_no_rules.append(
-                    {
-                        "filename": sample.get("filename"),
-                        "errors": failures_no_rules,
-                        "gold": gold_parse_no_rules,
-                        "pred": pred_parse_no_rules,
-                    }
-                )
     errors = confusion.copy()
     for label in set(label for pair in confusion for label in pair):
@@ -473,9 +442,7 @@ def evaluate_model(
         ).most_common(30),
         "boundary_errors": boundary_errors,
         "parse_metrics": parse_metrics,
-        "parse_metrics_no_rules": parse_metrics_no_rules,
         "field_failures": field_failures,
-        "field_failures_no_rules": field_failures_no_rules,
     }
@@ -811,8 +778,7 @@ def main() -> None:
             ]
             return field_rows, full_line, error_rows
-        rule_field_rows, rule_full_line, rule_error_rows = parse_metric_tables(model_eval["parse_metrics"])
-        ner_field_rows, ner_full_line, ner_error_rows = parse_metric_tables(model_eval["parse_metrics_no_rules"])
         sections.append(
             (
                 "Model Confusion Analysis",
@@ -832,28 +798,17 @@ def main() -> None:
                         "### Top entity-type confusions",
                         markdown_table(["true", "pred", "count"], entity_rows) if entity_rows else "- none",
                         "",
-                        "### Field exact-match accuracy (rule-assisted)",
-                        markdown_table(["field", "correct/total", "accuracy"], rule_field_rows),
                         "",
-                        f"Rule-assisted full parse exact match: {rule_full_line}",
                         "",
-                        "### Top rule-assisted field parse errors",
-                        markdown_table(["field", "gold", "pred", "count"], rule_error_rows) if rule_error_rows else "- none",
                         "",
-                        "### Field exact-match accuracy (NER-only, no rules)",
-                        markdown_table(["field", "correct/total", "accuracy"], ner_field_rows),
-                        "",
-                        f"NER-only full parse exact match: {ner_full_line}",
-                        "",
-                        "### Top NER-only field parse errors",
-                        markdown_table(["field", "gold", "pred", "count"], ner_error_rows) if ner_error_rows else "- none",
-                        "",
-                        "### Hardest sampled parse failures (rule-assisted)",
                         markdown_json(model_eval["field_failures"][:10]) if model_eval["field_failures"] else "- none",
                         "",
-                        "### Hardest sampled parse failures (NER-only)",
-                        markdown_json(model_eval["field_failures_no_rules"][:10]) if model_eval["field_failures_no_rules"] else "- none",
-                        "",
                         "### Seqeval report",
                         "```text\n" + model_eval["classification_report"] + "\n```",
                     ]
@@ -870,7 +825,7 @@ def main() -> None:
                     "2. Prefer char-level or a deterministic hybrid tokenizer for DMHY filenames; avoid generic subword tokenization for labels.",
                     "3. For char-level runs, use `--tokenizer char --max-seq-length 128` with `vocab.char.json`.",
                     "4. Add CRF decoding or constrained BIO decoding so illegal I-X transitions and impossible boundary jumps are blocked.",
-                    "5. Keep rule-assisted post-processing for high-confidence structural anchors: leading group bracket, ` - 07`, `S01E07`, source, and resolution.",
                     "6. Track entity-level F1 and field exact-match on real filenames; do not accept low validation loss alone.",
                 ]
             ),

     entity_confusion: Counter = Counter()
     boundary_errors: Counter = Counter()
     parse_metrics: Counter = Counter()
     field_failures: List[dict] = []
     with torch.no_grad():
         for sample in eval_samples:
                 active_tokens,
                 true_labels,
                 tokenizer=tokenizer,
             )
             pred_parse = postprocess(
                 active_tokens,
                 pred_labels,
                 tokenizer=tokenizer,
             )
             update_parse_metrics(parse_metrics, gold_parse, pred_parse)
             failures = collect_field_failures(gold_parse, pred_parse)
             if failures and len(field_failures) < 30:
                 field_failures.append(
                         "pred": pred_parse,
                     }
                 )
     errors = confusion.copy()
     for label in set(label for pair in confusion for label in pair):
         ).most_common(30),
         "boundary_errors": boundary_errors,
         "parse_metrics": parse_metrics,
         "field_failures": field_failures,
     }
             ]
             return field_rows, full_line, error_rows
+        parse_field_rows, parse_full_line, parse_error_rows = parse_metric_tables(model_eval["parse_metrics"])
         sections.append(
             (
                 "Model Confusion Analysis",
                         "### Top entity-type confusions",
                         markdown_table(["true", "pred", "count"], entity_rows) if entity_rows else "- none",
                         "",
+                        "### Field exact-match accuracy (thin runtime)",
+                        markdown_table(["field", "correct/total", "accuracy"], parse_field_rows),
                         "",
+                        f"Thin-runtime full parse exact match: {parse_full_line}",
                         "",
+                        "### Top thin-runtime field parse errors",
+                        markdown_table(["field", "gold", "pred", "count"], parse_error_rows) if parse_error_rows else "- none",
                         "",
+                        "### Hardest sampled parse failures",
                         markdown_json(model_eval["field_failures"][:10]) if model_eval["field_failures"] else "- none",
                         "",
                         "### Seqeval report",
                         "```text\n" + model_eval["classification_report"] + "\n```",
                     ]
                     "2. Prefer char-level or a deterministic hybrid tokenizer for DMHY filenames; avoid generic subword tokenization for labels.",
                     "3. For char-level runs, use `--tokenizer char --max-seq-length 128` with `vocab.char.json`.",
                     "4. Add CRF decoding or constrained BIO decoding so illegal I-X transitions and impossible boundary jumps are blocked.",
+                    "5. Keep runtime post-processing thin: BIO aggregation plus string/number normalization.",
                     "6. Track entity-level F1 and field exact-match on real filenames; do not accept low validation loss alone.",
                 ]
             ),

docs/onnx.md CHANGED Viewed

@@ -107,15 +107,14 @@ The runtime parser should do this:
    使用约束 BIO transition 解码标签。
 8. Aggregate labels into parser fields.
    聚合标签为结构化字段。
-9. Apply thin normalization only: trim brackets/extensions and convert numeric
-   fields.
    只做薄层规范化：裁剪括号/扩展名并转换数字字段。
-The legacy structural assist layer is available only behind `--rule-assist` in
-the Python tools. It is not part of the default ONNX reference runtime.
-旧结构辅助层只在 Python 工具的 `--rule-assist` 下显式启用，不属于默认 ONNX
-参考运行时。
 ## 5. Android Notes / Android 注意事项

    使用约束 BIO transition 解码标签。
 8. Aggregate labels into parser fields.
    聚合标签为结构化字段。
+9. Apply thin normalization only: trim brackets, normalize source text, and
+   convert numeric fields.
    只做薄层规范化：裁剪括号/扩展名并转换数字字段。
+The ONNX reference runtime intentionally matches the Python thin runtime. It
+does not include structural filename regex assists.
+ONNX 参考运行时有意与 Python 薄层运行时保持一致，不包含结构化文件名正则辅助。
 ## 5. Android Notes / Android 注意事项

docs/training.md CHANGED Viewed

@@ -172,12 +172,12 @@ The default quality gate is model-led parsing:
 - fixed regression `model_only >= 85%`
 - held-out parse `model_only >= 75%`
 - `normalized_only` is the default thin runtime metric
-- `rule_assisted` is compatibility/diagnostic only
 - 固定回归 `model_only >= 85%`
 - held-out 解析 `model_only >= 75%`
 - `normalized_only` 是默认薄层运行时指标
-- `rule_assisted` 只作为兼容/诊断对照
 ## 7. Publish to Repository Root / 发布到仓库根目录

 - fixed regression `model_only >= 85%`
 - held-out parse `model_only >= 75%`
 - `normalized_only` is the default thin runtime metric
+- structural filename assists are not part of training or release metrics
 - 固定回归 `model_only >= 85%`
 - held-out 解析 `model_only >= 75%`
 - `normalized_only` 是默认薄层运行时指标
+- 结构化文件名辅助不属于训练或发布指标
 ## 7. Publish to Repository Root / 发布到仓库根目录

evaluate_parser_cases.py CHANGED Viewed

@@ -43,7 +43,6 @@ def evaluate_cases(
     case_file: str,
     tokenizer_variant: Optional[str],
     max_length: Optional[int],
-    use_rules: bool,
     constrain_bio: bool,
 ) -> Dict:
     cfg = Config()
@@ -71,7 +70,6 @@ def evaluate_cases(
             id2label,
             max_length=resolved_max_length,
             debug=False,
-            use_rules=use_rules,
             constrain_bio=constrain_bio,
         )
         errors = {}
@@ -108,7 +106,6 @@ def evaluate_cases(
         "case_file": case_file,
         "tokenizer_variant": getattr(tokenizer, "tokenizer_variant", "regex"),
         "max_length": resolved_max_length,
-        "use_rules": use_rules,
         "constrain_bio": constrain_bio,
         "case_count": len(cases),
         "full_correct": full_correct,
@@ -128,9 +125,8 @@ def evaluate_case_modes(
     max_length: Optional[int],
 ) -> Dict:
     modes = {
-        "model_only": {"use_rules": False, "constrain_bio": False},
-        "normalized_only": {"use_rules": False, "constrain_bio": True},
-        "rule_assisted": {"use_rules": True, "constrain_bio": True},
     }
     results = {
         name: evaluate_cases(
@@ -138,7 +134,6 @@ def evaluate_case_modes(
             case_file=case_file,
             tokenizer_variant=tokenizer_variant,
             max_length=max_length,
-            use_rules=settings["use_rules"],
             constrain_bio=settings["constrain_bio"],
         )
         for name, settings in modes.items()
@@ -170,17 +165,10 @@ def main() -> None:
     parser.add_argument("--tokenizer", choices=["regex", "char"], default=None)
     parser.add_argument("--max-length", type=int, default=None)
     parser.add_argument("--output", default=None, help="Optional JSON output path")
-    parser.add_argument("--mode", choices=["all", "model-only", "normalized-only", "rule-assisted"], default="all")
-    parser.add_argument("--rule-assist", action="store_true", help="Shortcut for --mode rule-assisted")
-    parser.add_argument("--no-rule-assist", action="store_true", help=argparse.SUPPRESS)
     parser.add_argument("--no-constrained-bio", action="store_true")
     args = parser.parse_args()
-    if args.rule_assist:
-        args.mode = "rule-assisted"
-    if args.no_rule_assist and args.mode == "rule-assisted":
-        args.mode = "normalized-only"
     if args.mode == "all" and not args.no_constrained_bio:
         metrics = evaluate_case_modes(
             model_dir=args.model_dir,
@@ -188,18 +176,16 @@ def main() -> None:
             tokenizer_variant=args.tokenizer,
             max_length=args.max_length,
         )
-        for name in ("model_only", "normalized_only", "rule_assisted"):
             print_metrics(name, metrics["modes"][name])
             print()
     else:
-        use_rules = args.mode == "rule-assisted"
         constrain_bio = not args.no_constrained_bio and args.mode != "model-only"
         metrics = evaluate_cases(
             model_dir=args.model_dir,
             case_file=args.case_file,
             tokenizer_variant=args.tokenizer,
             max_length=args.max_length,
-            use_rules=use_rules,
             constrain_bio=constrain_bio,
         )
         print_metrics(args.mode, metrics)

     case_file: str,
     tokenizer_variant: Optional[str],
     max_length: Optional[int],
     constrain_bio: bool,
 ) -> Dict:
     cfg = Config()
             id2label,
             max_length=resolved_max_length,
             debug=False,
             constrain_bio=constrain_bio,
         )
         errors = {}
         "case_file": case_file,
         "tokenizer_variant": getattr(tokenizer, "tokenizer_variant", "regex"),
         "max_length": resolved_max_length,
         "constrain_bio": constrain_bio,
         "case_count": len(cases),
         "full_correct": full_correct,
     max_length: Optional[int],
 ) -> Dict:
     modes = {
+        "model_only": {"constrain_bio": False},
+        "normalized_only": {"constrain_bio": True},
     }
     results = {
         name: evaluate_cases(
             case_file=case_file,
             tokenizer_variant=tokenizer_variant,
             max_length=max_length,
             constrain_bio=settings["constrain_bio"],
         )
         for name, settings in modes.items()
     parser.add_argument("--tokenizer", choices=["regex", "char"], default=None)
     parser.add_argument("--max-length", type=int, default=None)
     parser.add_argument("--output", default=None, help="Optional JSON output path")
+    parser.add_argument("--mode", choices=["all", "model-only", "normalized-only"], default="all")
     parser.add_argument("--no-constrained-bio", action="store_true")
     args = parser.parse_args()
     if args.mode == "all" and not args.no_constrained_bio:
         metrics = evaluate_case_modes(
             model_dir=args.model_dir,
             tokenizer_variant=args.tokenizer,
             max_length=args.max_length,
         )
+        for name in ("model_only", "normalized_only"):
             print_metrics(name, metrics["modes"][name])
             print()
     else:
         constrain_bio = not args.no_constrained_bio and args.mode != "model-only"
         metrics = evaluate_cases(
             model_dir=args.model_dir,
             case_file=args.case_file,
             tokenizer_variant=args.tokenizer,
             max_length=args.max_length,
             constrain_bio=constrain_bio,
         )
         print_metrics(args.mode, metrics)

inference.py CHANGED Viewed

@@ -11,7 +11,6 @@ Usage:
 import argparse
 import json
-import os
 import re
 import sys
 from typing import Dict, List, Optional, Tuple
@@ -98,6 +97,15 @@ def thin_source_priority(source: str) -> int:
     return 40 if re.search(r"[&+/,]", source) else 30
 def choose_thin_source(sources: List[str]) -> Optional[str]:
     cleaned = [normalize_source_text(source) for source in sources if normalize_field_text(source)]
     if not cleaned:
@@ -239,8 +247,6 @@ def postprocess(
     tokens: List[str],
     labels: List[str],
     tokenizer: Optional[AnimeTokenizer] = None,
-    filename: Optional[str] = None,
-    use_rules: bool = False,
 ) -> Dict:
     """
     Convert BIO-labeled tokens into structured metadata.
@@ -298,658 +304,9 @@ def postprocess(
     result["source"] = choose_thin_source(grouped_entities.get("SOURCE", []))
-    if use_rules and filename:
-        result = apply_rule_assists(filename, result)
     return result
-BRACKET_RE = re.compile(r"\[([^\]]+)\]|\(([^)]+)\)|【([^】]+)】|《([^》]+)》")
-RESOLUTION_RE = re.compile(r"(?<![A-Za-z0-9])(?:\d{3,4}[pP]|\d[Kk]|\d{3,4}[xX×]\d{3,4})(?![A-Za-z0-9])")
-SOURCE_TOKEN_PATTERN = (
-    r"WEB[-_ ]?DL|WEB[-_ ]?Rip|BDRip|BluRay|BDMV|BD|DVDRip|DVD|TVRip|HDTV|"
-    r"Netflix|NF|AMZN|Baha|CR|ABEMA|DSNP|U[-_ ]?NEXT|Hulu|AT[-_ ]?X|"
-    r"x26[45]|h\.?26[45]|HEVC|AVC|AV1|AAC\d*(?:\.\d+)?|AAC|FLAC|MP3|DTS|Opus|"
-    r"SDR|HDR10?|UHD|REMUX|10bit|8bit|Hi10p|Ma10p|ASSx?\d*|SRTx?\d*|"
-    r"CHS|CHT|GB|BIG5|JPN?|JPSC|JPTC|繁中|简中"
-)
-SOURCE_RE = re.compile(rf"\b(?:{SOURCE_TOKEN_PATTERN})\b", re.I)
-SOURCE_TAG_RE = re.compile(
-    rf"^(?:{SOURCE_TOKEN_PATTERN})(?:\s*(?:[&+/]|,\s*)\s*(?:{SOURCE_TOKEN_PATTERN}))*$",
-    re.I,
-)
-SPECIAL_TAG_RE = re.compile(
-    r"^(?:檢索|检索|搜索|搜寻|搜尋|别名|別名|alias|search|keyword)\s*[:：].+",
-    re.I,
-)
-SPECIAL_CODE_RE = re.compile(
-    r"^(?:NCOP|NCED|OP|ED|PV|CM)\d*$|^IV\d+$|^(?:OVA|OAD|SP)\d*$",
-    re.I,
-)
-SPECIAL_CODE_INLINE_RE = re.compile(
-    r"(?<![A-Za-z0-9])"
-    r"(?P<code>(?:NCOP|NCED)(?:[\s._-]*\d{1,4})?|(?:OP|ED|PV|CM)\d{1,4}|IV\d{1,4})"
-    r"(?![A-Za-z0-9])",
-    re.I,
-)
-EPISODE_PATTERNS = [
-    ("season_episode", re.compile(r"[Ss]\d{1,2}[Ee](?P<ep>\d{1,4})(?:v\d+)?", re.I)),
-    ("dash_episode", re.compile(r"(?:^|[\s._])[-_]\s*(?P<ep>\d{1,4})(?:v\d+)?(?=$|[\s._\-\]\)】》\[])")),
-    ("bracket_episode", re.compile(r"[\[\(【《](?:EP?|#)?(?P<ep>\d{1,4})(?:v\d+)?[\]\)】》]", re.I)),
-    ("explicit_episode", re.compile(r"(?:^|[\s._\-\[\(【《#])(?:EP?|第|#)(?P<ep>\d{1,4})(?:v\d+)?(?:[话話集])?(?=$|[\s._\-\]\)】》])", re.I)),
-    (
-        "long_episode",
-        re.compile(
-            r"(?:^|[\s._\-\[\(【《])(?P<ep>\d{3,4})(?:v\d+)?"
-            r"(?=[\s._\-\]\)】》\[]+(?:\d{3,4}[pP]|WEB|BD|BluRay|HDTV|NF|AMZN|CR|Baha))",
-            re.I,
-        ),
-    ),
-    ("generic_episode", re.compile(r"(?:^|[\s._\-\[\(【《#])(?P<ep>\d{1,3})(?:v\d+)?(?=$|[\s._\-\]\)】》])", re.I)),
-]
-SEASON_RE = re.compile(r"(?:^|[\s._\-\[\(【《])(?:[Ss](?P<s1>\d{1,2})|Season\s*(?P<s2>\d{1,2})|第(?P<s3>[一二三四五六七八九十\d]+)[季期部])", re.I)
-SEQUEL_MARKER_RE = re.compile(
-    r"(?<![A-Za-z0-9])"
-    r"(?P<marker>"
-    r"Ni\s+no\s+(?:Sara|Shou|Sho|Syo|Shō)|"
-    r"San\s+no\s+(?:Sara|Shou|Sho|Syo)|"
-    r"(?:Yon|Shi|Shin)\s+no\s+Sara|"
-    r"(?:Go|Gou)\s+no\s+Sara|"
-    r"Ni\s+Gakki|Sono\s+Ni|Ni|"
-    r"II|III|IV|V|VI|VII|VIII|IX|[ⅡⅢⅣⅤⅥⅦⅧⅨ]|"
-    r"[一二三四五六七八九十兩两貳贰弐弍參叁参肆伍陸陆柒捌玖](?:\s*(?:ノ|の|之)\s*(?:章|期|季|部))?"
-    r")"
-    r"(?![A-Za-z0-9])",
-    re.I,
-)
-TRAILING_SEQUEL_MARKER_RE = re.compile(
-    r"(?:^|[\s._-])"
-    r"(?P<marker>"
-    r"Ni\s+no\s+(?:Sara|Shou|Sho|Syo|Shō)|"
-    r"San\s+no\s+(?:Sara|Shou|Sho|Syo)|"
-    r"(?:Yon|Shi|Shin)\s+no\s+Sara|"
-    r"(?:Go|Gou)\s+no\s+Sara|"
-    r"Ni\s+Gakki|Sono\s+Ni|Ni|"
-    r"II|III|IV|V|VI|VII|VIII|IX|[ⅡⅢⅣⅤⅥⅦⅧⅨ]|"
-    r"[一二三四五六七八九十兩两貳贰弐弍參叁参肆伍陸陆柒捌玖](?:\s*(?:ノ|の|之)\s*(?:章|期|季|部))?"
-    r")$",
-    re.I,
-)
-NOISE_META_RE = re.compile(
-    r"^(?:\d{3,4}[pP]|\d[Kk]|WEB[-_ ]?DL|WEB[-_ ]?Rip|BDRip|BluRay|BDMV|BD|DVDRip|DVD|TVRip|"
-    r"HDTV|Netflix|NF|AMZN|Baha|CR|HEVC|AVC|AV1|x26[45]|h\.?26[45]|AAC.*|FLAC|MP3|DTS|"
-    r"Opus|SDR|HDR10?|UHD|REMUX|10bit|8bit|Hi10p|Ma10p|ASS.*|SRT.*|CHS|CHT|BIG5|GB|JPN?|"
-    r"JPSC|JPTC|MP4|MKV|繁中|简中|内封|外挂)$",
-    re.I,
-)
-DATE_RE = re.compile(r"^(?:19|20)\d{2}(?:[.\-_年]?(?:0?[1-9]|1[0-2]))?(?:[.\-_月]?(?:0?[1-9]|[12]\d|3[01]))?日?$")
-CATEGORY_BRACKETS = {
-    "国漫", "國漫", "国产", "國產", "国产动漫", "國產動漫", "国产动画", "國產動畫",
-    "国创", "國創", "中国动漫", "中國動漫", "中国动画", "中國動畫",
-}
-def cn_number_to_int(text: str) -> Optional[int]:
-    if text.isdigit():
-        return int(text)
-    values = {"一": 1, "二": 2, "三": 3, "四": 4, "五": 5, "六": 6, "七": 7, "八": 8, "九": 9}
-    if text == "十":
-        return 10
-    if text.startswith("十") and len(text) == 2:
-        return 10 + values.get(text[1], 0)
-    if text.endswith("十") and len(text) == 2:
-        return values.get(text[0], 0) * 10
-    if "十" in text and len(text) == 3:
-        return values.get(text[0], 0) * 10 + values.get(text[2], 0)
-    return values.get(text)
-def bracket_parts(filename: str) -> List[Tuple[str, int, int]]:
-    parts: List[Tuple[str, int, int]] = []
-    for match in BRACKET_RE.finditer(filename):
-        text = next(group for group in match.groups() if group is not None)
-        parts.append((text.strip(), match.start(), match.end()))
-    return parts
-def looks_like_group(text: str) -> bool:
-    if not text or NOISE_META_RE.search(text):
-        return False
-    return bool(
-        re.search(
-            r"(?:字幕|字幕组|字幕組|sub|subs|raws?|fansub|studio|house|team|project|"
-            r"loli|ani|vcb|airota|kiss|dmhy|erai|subsplease)",
-            text,
-            re.I,
-        )
-    )
-def looks_like_episode_or_meta(text: str) -> bool:
-    if not text:
-        return False
-    clean = text.strip()
-    normalized = re.sub(r"[\s._-]+", "", clean)
-    return bool(
-        re.fullmatch(r"(?:EP?|#)?\d{1,4}(?:v\d+)?", clean, re.I)
-        or DATE_RE.fullmatch(clean)
-        or normalized in CATEGORY_BRACKETS
-        or RESOLUTION_RE.search(clean)
-        or SOURCE_TAG_RE.fullmatch(clean)
-        or SOURCE_RE.search(clean)
-        or SPECIAL_TAG_RE.search(clean)
-        or SPECIAL_CODE_RE.fullmatch(normalized)
-        or NOISE_META_RE.search(clean)
-    )
-def normalize_special_code(text: str) -> str:
-    return re.sub(r"[\s._-]+", "", text.strip())
-def special_code_spans(filename: str) -> List[Tuple[str, int, int]]:
-    spans: List[Tuple[str, int, int]] = []
-    for text, start, end in bracket_parts(filename):
-        normalized = normalize_special_code(text)
-        if SPECIAL_CODE_RE.fullmatch(normalized):
-            spans.append((normalized, start, end))
-    for match in SPECIAL_CODE_INLINE_RE.finditer(filename):
-        normalized = normalize_special_code(match.group("code"))
-        if SPECIAL_CODE_RE.fullmatch(normalized):
-            spans.append((normalized, match.start("code"), match.end("code")))
-    deduped: List[Tuple[str, int, int]] = []
-    seen: set[Tuple[str, int, int]] = set()
-    for value, start, end in sorted(spans, key=lambda item: (item[1], item[2])):
-        key = (value.lower(), start, end)
-        if key in seen:
-            continue
-        seen.add(key)
-        deduped.append((value, start, end))
-    return deduped
-def special_code_brackets(filename: str) -> List[Tuple[str, int, int]]:
-    return [
-        (text.strip(), start, end)
-        for text, start, end in bracket_parts(filename)
-        if SPECIAL_CODE_RE.fullmatch(normalize_special_code(text))
-    ]
-def span_is_inside_special_code(filename: str, start: int, end: int) -> bool:
-    return any(special_start <= start and end <= special_end for _code, special_start, special_end in special_code_spans(filename))
-def has_non_special_episode_context(filename: str, episode: int) -> bool:
-    masked = filename
-    for _text, start, end in reversed(special_code_brackets(filename)):
-        masked = masked[:start] + (" " * (end - start)) + masked[end:]
-    return plausible_episode_context(masked, episode) and best_structural_episode(masked) == episode
-def episode_comes_only_from_special_code(filename: str, episode: Optional[int]) -> bool:
-    if episode is None:
-        return False
-    specials = special_code_spans(filename)
-    if not specials:
-        return False
-    ep_text = str(int(episode))
-    for normalized, _start, _end in specials:
-        if re.search(rf"0*{re.escape(ep_text)}$", normalized):
-            return not has_non_special_episode_context(filename, int(episode))
-    return False
-def strip_title_special_codes(title: str, special: Optional[str] = None) -> str:
-    cleaned = title.strip()
-    while True:
-        next_cleaned = re.sub(
-            r"\s*[\[\(【《]\s*(?:(?:NCOP|NCED|OP|ED|PV|CM)\d*|IV\d+|(?:OVA|OAD|SP)\d*)\s*[\]\)】》]\s*$",
-            "",
-            cleaned,
-            flags=re.I,
-        ).strip(" \t-_.")
-        if next_cleaned == cleaned:
-            break
-        cleaned = next_cleaned
-    cleaned = re.sub(r"\s+(?:NCOP|NCED|OP|ED|PV|CM)\d*$", "", cleaned, flags=re.I).strip(" \t-_.")
-    if special:
-        normalized = re.sub(r"[\s._-]+", "", str(special).strip())
-        match = re.fullmatch(r"([A-Za-z]+)\d+", normalized)
-        if match and SPECIAL_CODE_RE.fullmatch(normalized):
-            prefix = re.escape(match.group(1))
-            cleaned = re.sub(rf"\s+{prefix}$", "", cleaned, flags=re.I).strip(" \t-_.")
-    return cleaned or title
-def looks_like_structural_group(text: str, filename: str, bracket_end: int) -> bool:
-    """Heuristic for short leading release-group brackets not in the name list."""
-    if looks_like_group(text):
-        return True
-    if not text or looks_like_episode_or_meta(text):
-        return False
-    after = filename[bracket_end:].lstrip(" \t._")
-    if after.startswith("-"):
-        return False
-    next_bracket = BRACKET_RE.match(after)
-    if next_bracket:
-        next_text = next(group for group in next_bracket.groups() if group is not None)
-        if looks_like_episode_or_meta(next_text):
-            return False
-    words = re.findall(r"[A-Za-z0-9]+", text)
-    if not words:
-        if re.search(r"[\u3400-\u9fff]", text) and len(text) <= 32:
-            return True
-        return False
-    if len(text) > 32:
-        return False
-    if len(words) == 1:
-        return True
-    if any(sep in text for sep in "-_"):
-        return True
-    if words[0].isupper() and len(words[0]) <= 4 and len(words) <= 3:
-        return True
-    return False
-def apply_rule_assists(filename: str, result: Dict) -> Dict:
-    """
-    Fill high-confidence structural fields from filename conventions.
-    The model remains the primary tagger; rules only fill missing obvious fields
-    or repair common boundary drift around leading group brackets and episodes.
-    """
-    repaired = dict(result)
-    brackets = bracket_parts(filename)
-    if (not repaired.get("group") or (repaired.get("title") and repaired["group"] in repaired["title"])) and brackets:
-        first_text, first_start, first_end = brackets[0]
-        if first_start == 0 and looks_like_structural_group(first_text, filename, first_end):
-            repaired["group"] = first_text
-    if not repaired.get("resolution"):
-        match = RESOLUTION_RE.search(filename)
-        if match:
-            repaired["resolution"] = match.group(0)
-    source_matches = source_candidates(filename)
-    current_source = repaired.get("source")
-    preferred_source = source_matches[0] if source_matches else None
-    if preferred_source and (
-        not current_source
-        or source_priority(preferred_source) > source_priority(str(current_source))
-        or (
-            source_priority(preferred_source) == source_priority(str(current_source))
-            and preferred_source.lower() != str(current_source).lower()
-        )
-    ):
-        repaired["source"] = preferred_source
-    special_spans = special_code_spans(filename)
-    current_special = repaired.get("special")
-    if special_spans:
-        preferred_special = special_spans[0][0]
-        current_normalized = normalize_special_code(str(current_special)) if current_special else ""
-        if not current_special or preferred_special.lower().startswith(current_normalized.lower()):
-            repaired["special"] = preferred_special
-    if not repaired.get("special"):
-        for text, _start, _end in brackets:
-            clean = text.strip()
-            if SPECIAL_TAG_RE.search(clean):
-                repaired["special"] = clean
-                break
-    episode = best_structural_episode(filename)
-    if episode is not None and (
-        repaired.get("episode") is None
-        or not plausible_episode_context(filename, int(repaired["episode"]))
-    ):
-        repaired["episode"] = episode
-    if repaired.get("episode") is not None and not plausible_episode_context(filename, int(repaired["episode"])):
-        repaired["episode"] = episode
-    if episode_comes_only_from_special_code(filename, repaired.get("episode")):
-        repaired["episode"] = None
-    if repaired.get("season") is None:
-        match = SEASON_RE.search(filename)
-        if match:
-            value = next(group for group in match.groups() if group)
-            season = cn_number_to_int(value)
-            if season is not None:
-                repaired["season"] = season
-        if repaired.get("season") is None and repaired.get("episode") is not None:
-            sequel = structural_sequel_marker(filename, repaired.get("group"), repaired.get("episode"))
-            if sequel is not None:
-                repaired["season"] = sequel[1]
-    elif repaired.get("episode") == repaired.get("season") and not SEASON_RE.search(filename):
-        repaired["season"] = None
-    title = repaired.get("title")
-    group = repaired.get("group")
-    if group and (NOISE_META_RE.search(str(group)) or SOURCE_RE.fullmatch(str(group)) or RESOLUTION_RE.fullmatch(str(group))):
-        repaired["group"] = None
-        group = None
-    if title and group and title.startswith(group):
-        title = title[len(group):].lstrip("]】)>}）》 \t-_.")
-        repaired["title"] = title or repaired["title"]
-    if repaired.get("episode"):
-        repaired_title = infer_title_span(filename, group, repaired["episode"])
-        if repaired_title:
-            repaired["title"] = repaired_title
-    structured_title = infer_structured_bracket_title(filename, group, repaired.get("episode"))
-    if structured_title:
-        repaired["title"] = structured_title
-    if repaired.get("title") and repaired.get("season") is not None:
-        repaired["title"] = strip_trailing_season_from_title(repaired["title"], repaired["season"])
-    if repaired.get("episode") is None and repaired.get("group") and repaired.get("special"):
-        inferred_title = infer_title_span(filename, repaired.get("group"), None)
-        if inferred_title:
-            repaired["title"] = inferred_title
-    if repaired.get("title"):
-        repaired["title"] = strip_title_special_codes(repaired["title"], repaired.get("special"))
-    return repaired
-def structural_sequel_marker(
-    filename: str,
-    group: Optional[str],
-    episode: Optional[int],
-) -> Optional[Tuple[str, int]]:
-    if episode is None:
-        return None
-    title_end = None
-    if episode is not None:
-        ep_patterns = [
-            rf"[Ss]\d{{1,2}}[Ee]0*{episode}(?:v\d+)?",
-            rf"\s[-_]\s*0*{episode}(?:v\d+)?(?=$|[\s\[\(【《._-])",
-            rf"[\[\(【《]0*{episode}(?:v\d+)?[\]\)】》]",
-            rf"#\s*0*{episode}(?:v\d+)?(?=$|[\s\[\(【《._-])",
-            rf"(?:^|[\s._\-\[\(【《])第0*{episode}(?:[话話集])?(?=$|[\s._\-\]\)】》])",
-        ]
-        start = 0
-        if group:
-            first = BRACKET_RE.match(filename)
-            if first and group in first.group(0):
-                start = first.end()
-        for pattern in ep_patterns:
-            match = re.search(pattern, filename[start:], re.I)
-            if match:
-                title_end = start + match.start()
-                break
-    if title_end is None:
-        return None
-    prefix = filename[:title_end].rstrip(" \t-_.")
-    for match in reversed(list(SEQUEL_MARKER_RE.finditer(prefix))):
-        marker = match.group("marker")
-        value = season_marker_number(marker)
-        if value is None:
-            continue
-        tail = prefix[match.end():].strip(" \t-_.")
-        if tail:
-            continue
-        if marker.lower() == "ni" and "Kakuriyo no Yadomeshi Ni" not in prefix:
-            continue
-        return marker, value
-    numeric_tail = re.search(r"(?:^|[\s._-])(?P<season>[2-9])$", prefix)
-    if numeric_tail:
-        return numeric_tail.group("season"), int(numeric_tail.group("season"))
-    return None
-def normalize_source_text(text: str) -> str:
-    text = re.sub(r"\s+", "", text.strip())
-    text = re.sub(r"(?i)WEB[_ ]?DL", "WEB-DL", text)
-    text = re.sub(r"(?i)WEB[_ ]?Rip", "WebRip", text)
-    text = re.sub(r"(?i)U[_ ]?NEXT", "U-NEXT", text)
-    text = re.sub(r"(?i)AT[_ ]?X", "AT-X", text)
-    return text.replace("_", "-")
-def source_priority(source: str) -> int:
-    normalized = source.lower().replace("_", "-").replace(" ", "")
-    parts = re.split(r"[&+/,]", normalized)
-    if any(part in {"nf", "netflix", "amzn", "baha", "cr", "abema", "dsnp", "u-next", "hulu", "at-x", "web-dl", "webdl", "webrip", "web-rip", "bdrip", "bluray", "bdmv", "bd", "dvdrip", "dvd", "tvrip", "hdtv"} for part in parts):
-        return 90
-    if any(part in {"chs", "cht", "gb", "big5", "jpn", "jpsc", "jptc", "繁中", "简中"} for part in parts):
-        return 70
-    if any(part in {"x264", "x265", "h.264", "h264", "h.265", "h265", "hevc", "avc", "av1", "aac", "flac", "mp3", "dts", "opus", "10bit", "8bit", "hi10p", "ma10p", "srt", "srtx2", "ass", "assx2"} for part in parts):
-        return 20
-    if len(parts) > 1:
-        return 40
-    return 20
-def source_candidates(filename: str) -> List[str]:
-    candidates: List[Tuple[int, int, str]] = []
-    for text, start, _end in bracket_parts(filename):
-        clean = text.strip()
-        if SOURCE_TAG_RE.fullmatch(clean):
-            normalized = normalize_source_text(clean)
-            candidates.append((source_priority(normalized), -start, normalized))
-    for match in SOURCE_RE.finditer(filename):
-        normalized = normalize_source_text(match.group(0))
-        candidates.append((source_priority(normalized), -match.start(), normalized))
-    deduped: Dict[str, Tuple[int, int, str]] = {}
-    for priority, neg_start, value in candidates:
-        key = value.lower()
-        if key not in deduped or (priority, neg_start) > (deduped[key][0], deduped[key][1]):
-            deduped[key] = (priority, neg_start, value)
-    return [value for _priority, _neg_start, value in sorted(deduped.values(), reverse=True)]
-def is_category_text(text: str) -> bool:
-    normalized = re.sub(r"[\s._-]+", "", text.strip())
-    return normalized in CATEGORY_BRACKETS
-def infer_structured_bracket_title(
-    filename: str,
-    group: Optional[str],
-    episode: Optional[int],
-) -> Optional[str]:
-    """Pick the primary title from [group][category][title][alias][year][episode] rows."""
-    brackets = bracket_parts(filename)
-    if len(brackets) < 4 or episode is None:
-        return None
-    start_index = 0
-    if group and brackets and brackets[0][0] == group:
-        start_index = 1
-    search = brackets[start_index:]
-    if not search or not any(is_category_text(text) for text, _start, _end in search[:2]):
-        return None
-    episode_index = None
-    for idx, (text, _start, _end) in enumerate(brackets):
-        if re.fullmatch(rf"(?:EP?|#)?0*{episode}(?:v\d+)?", text.strip(), re.I):
-            episode_index = idx
-            break
-    if episode_index is None:
-        return None
-    candidates: List[Tuple[int, str]] = []
-    for idx in range(start_index, episode_index):
-        text = brackets[idx][0].strip()
-        if not text or looks_like_episode_or_meta(text):
-            continue
-        score = 0
-        if SEASON_RE.search(text) or TRAILING_SEQUEL_MARKER_RE.search(text):
-            score += 50
-        if re.search(r"[\u3400-\u9fff]", text):
-            score += 20
-        if idx > start_index:
-            score += 10
-        candidates.append((score, text))
-    if not candidates:
-        return None
-    return max(candidates, key=lambda item: item[0])[1]
-def best_structural_episode(filename: str) -> Optional[int]:
-    priorities = {
-        "season_episode": 1000,
-        "dash_episode": 900,
-        "bracket_episode": 850,
-        "explicit_episode": 800,
-        "long_episode": 750,
-        "generic_episode": 100,
-    }
-    candidates: List[Tuple[int, int, int]] = []
-    for name, pattern in EPISODE_PATTERNS:
-        for match in pattern.finditer(filename):
-            ep_text = match.group("ep")
-            ep = int(ep_text)
-            if ep == 0 or ep > 2000:
-                continue
-            ep_start = match.start("ep")
-            ep_end = match.end("ep")
-            if span_is_inside_special_code(filename, ep_start, ep_end):
-                continue
-            if name == "generic_episode":
-                tail = filename[ep_end:]
-                if re.match(r"[-_][A-Za-z]", tail):
-                    continue
-                if not re.match(
-                    r"(?:$|[\]\)】》]|[\s._-]+(?:"
-                    r"\[[^\]]*(?:\d{3,4}[pP]|WEB|BD|BluRay|HDTV|NF|AMZN|CR|Baha|Ma10p|x26|HEVC|AVC)|"
-                    r"\d{3,4}[pP]|WEB|BD|BluRay|HDTV|NF|AMZN|CR|Baha|Ma10p|x26|HEVC|AVC|mkv|mp4|avi"
-                    r"))",
-                    tail,
-                    re.I,
-                ):
-                    continue
-            context = filename[max(0, ep_start - 5):ep_end + 5]
-            if RESOLUTION_RE.search(context) or re.search(r"AAC|DDP|AC3|H\.?26[45]|x26[45]", context, re.I):
-                continue
-            priority = priorities[name]
-            if 1 <= ep <= 200:
-                priority += 20
-            candidates.append((priority, ep_start, ep))
-    if not candidates:
-        return None
-    return max(candidates, key=lambda item: (item[0], item[1]))[2]
-def plausible_episode_context(filename: str, episode: int) -> bool:
-    ep_text = str(episode)
-    padded = f"{episode:02d}"
-    if re.search(rf"(?<![A-Za-z0-9])(?:H|x)\.?0*{re.escape(ep_text)}(?!\d)", filename, re.I):
-        return False
-    patterns = [
-        rf"[Ss]\d{{1,2}}[Ee]0*{episode}(?:v\d+)?",
-        rf"(?:^|[\s._])[-_]\s*0*{episode}(?:v\d+)?(?=$|[\s._\-\]\)】》\[])",
-        rf"[\[\(【《](?:EP?|#)?0*{episode}(?:v\d+)?[\]\)】》]",
-        rf"(?:^|[\s._\-\[\(【《#])(?:EP?|第|#)0*{episode}(?:v\d+)?(?:[话話集])?(?=$|[\s._\-\]\)】》])",
-        rf"(?:^|[\s._\-\[\(【《])0*{episode}(?:v\d+)?(?=[\s._\-\]\)】》\[]+(?:\d{{3,4}}[pP]|WEB|BD|BluRay|HDTV|NF|AMZN|CR|Baha))",
-    ]
-    if any(re.search(pattern, filename, re.I) for pattern in patterns):
-        return True
-    return bool(re.search(rf"(?:^|[\s._-])(?:{re.escape(ep_text)}|{re.escape(padded)})(?:v\d+)?$", filename, re.I))
-def strip_trailing_season_from_title(title: str, season: int) -> str:
-    season_text = str(season)
-    patterns = [
-        rf"\s+[Ss]0*{season_text}$",
-        rf"\s+Season\s*0*{season_text}$",
-        rf"\s+0*{season_text}$",
-        rf"\s+第(?:0*{season_text}|{season_text})[季期部章]$",
-    ]
-    cleaned = title
-    for pattern in patterns:
-        cleaned = re.sub(pattern, "", cleaned, flags=re.I).strip(" \t-_.")
-    match = TRAILING_SEQUEL_MARKER_RE.search(cleaned)
-    if match and season_marker_number(match.group("marker")) == season:
-        cleaned = cleaned[:match.start()].strip(" \t-_.")
-    return cleaned or title
-def clean_inferred_title(title: str) -> str:
-    raw_title = title.strip(" \t-_.")
-    bracket_matches = list(BRACKET_RE.finditer(raw_title))
-    if bracket_matches:
-        first = bracket_matches[0]
-        prefix = raw_title[:first.start()].strip(" \t-_.★☆")
-        text = next(group for group in first.groups() if group is not None).strip()
-        if text and not looks_like_episode_or_meta(text) and (
-            not prefix
-            or re.search(r"(?:新番|月|合集|繁|简|字幕|先行|合集|★|☆)", prefix, re.I)
-        ):
-            return text
-    return raw_title.strip("[]()【】《》（）")
-def infer_title_span(filename: str, group: Optional[str], episode: Optional[int]) -> Optional[str]:
-    start = 0
-    if group:
-        first = BRACKET_RE.match(filename)
-        if first and group in first.group(0):
-            start = first.end()
-    else:
-        # Some releases put leading metadata before the actual title, e.g.
-        # `[1080p] Title - 01`. Do not keep that wrapper as title text.
-        while True:
-            leading = BRACKET_RE.match(filename[start:].lstrip(" \t._-"))
-            if not leading:
-                break
-            skipped_ws = len(filename[start:]) - len(filename[start:].lstrip(" \t._-"))
-            text = next(group for group in leading.groups() if group is not None)
-            if not looks_like_episode_or_meta(text):
-                break
-            start += skipped_ws + leading.end()
-    end = None
-    if episode is not None:
-        ep_patterns = [
-            rf"[Ss]\d{{1,2}}[Ee]0*{episode}(?:v\d+)?",
-            rf"\s[-_]\s*0*{episode}(?:v\d+)?(?=$|[\s\[\(【《._-])",
-            rf"[\[\(【《]0*{episode}(?:v\d+)?[\]\)】》]",
-            rf"#\s*0*{episode}(?:v\d+)?(?=$|[\s\[\(【《._-])",
-            rf"(?:^|[\s._\-\[\(【《])第0*{episode}(?:[话話集])?(?=$|[\s._\-\]\)】》])",
-            rf"[Ee]0*{episode}(?:v\d+)?",
-        ]
-        for pattern in ep_patterns:
-            match = re.search(pattern, filename[start:], re.I)
-            if match:
-                end = start + match.start()
-                break
-    if end is None:
-        for text, bracket_start, _bracket_end in bracket_parts(filename):
-            if bracket_start <= start:
-                continue
-            if (
-                NOISE_META_RE.search(text)
-                or RESOLUTION_RE.search(text)
-                or SOURCE_RE.search(text)
-                or SPECIAL_TAG_RE.search(text)
-                or SPECIAL_CODE_RE.fullmatch(re.sub(r"[\s._-]+", "", text.strip()))
-            ):
-                end = bracket_start
-                break
-    if end is None or end <= start:
-        return None
-    title = clean_inferred_title(filename[start:end])
-    return title or None
 def parse_filename(
     filename: str,
     model: BertForTokenClassification,
@@ -957,7 +314,6 @@ def parse_filename(
     id2label: Dict[int, str],
     max_length: int = 64,
     debug: bool = False,
-    use_rules: bool = False,
     constrain_bio: bool = True,
 ) -> Dict:
     """
@@ -1046,14 +402,12 @@ def parse_filename(
         tokens[:available],
         label_strings,
         tokenizer=tokenizer,
-        filename=filename,
-        use_rules=use_rules,
     )
     if debug:
         result["_debug"] = {
             "tokenizer_variant": getattr(tokenizer, "tokenizer_variant", "regex"),
             "decoder": "constrained_bio" if constrain_bio else "greedy",
-            "postprocess": "rule_assisted" if use_rules else "thin_normalize",
             "max_length": max_length,
             "token_count": len(tokens),
             "available_token_count": available,
@@ -1101,10 +455,6 @@ def main():
                         help="Maximum sequence length")
     parser.add_argument("--debug", action="store_true",
                         help="Include tokenizer, labels, scores, and entity spans in JSON output")
-    parser.add_argument("--rule-assist", action="store_true",
-                        help="Enable legacy structural post-processing rules")
-    parser.add_argument("--no-rule-assist", action="store_true",
-                        help=argparse.SUPPRESS)
     parser.add_argument("--no-constrained-bio", action="store_true",
                         help="Use greedy per-token decoding instead of constrained BIO Viterbi")
     args = parser.parse_args()
@@ -1152,7 +502,6 @@ def main():
             id2label,
             max_length,
             debug=args.debug,
-            use_rules=args.rule_assist and not args.no_rule_assist,
             constrain_bio=not args.no_constrained_bio,
         )
         result["_input"] = fn

 import argparse
 import json
 import re
 import sys
 from typing import Dict, List, Optional, Tuple
     return 40 if re.search(r"[&+/,]", source) else 30
+def normalize_source_text(text: str) -> str:
+    text = re.sub(r"\s+", "", text.strip())
+    text = re.sub(r"(?i)WEB[_ ]?DL", "WEB-DL", text)
+    text = re.sub(r"(?i)WEB[_ ]?Rip", "WebRip", text)
+    text = re.sub(r"(?i)U[_ ]?NEXT", "U-NEXT", text)
+    text = re.sub(r"(?i)AT[_ ]?X", "AT-X", text)
+    return text.replace("_", "-")
 def choose_thin_source(sources: List[str]) -> Optional[str]:
     cleaned = [normalize_source_text(source) for source in sources if normalize_field_text(source)]
     if not cleaned:
     tokens: List[str],
     labels: List[str],
     tokenizer: Optional[AnimeTokenizer] = None,
 ) -> Dict:
     """
     Convert BIO-labeled tokens into structured metadata.
     result["source"] = choose_thin_source(grouped_entities.get("SOURCE", []))
     return result
 def parse_filename(
     filename: str,
     model: BertForTokenClassification,
     id2label: Dict[int, str],
     max_length: int = 64,
     debug: bool = False,
     constrain_bio: bool = True,
 ) -> Dict:
     """
         tokens[:available],
         label_strings,
         tokenizer=tokenizer,
     )
     if debug:
         result["_debug"] = {
             "tokenizer_variant": getattr(tokenizer, "tokenizer_variant", "regex"),
             "decoder": "constrained_bio" if constrain_bio else "greedy",
+            "postprocess": "thin_normalize",
             "max_length": max_length,
             "token_count": len(tokens),
             "available_token_count": available,
                         help="Maximum sequence length")
     parser.add_argument("--debug", action="store_true",
                         help="Include tokenizer, labels, scores, and entity spans in JSON output")
     parser.add_argument("--no-constrained-bio", action="store_true",
                         help="Use greedy per-token decoding instead of constrained BIO Viterbi")
     args = parser.parse_args()
             id2label,
             max_length,
             debug=args.debug,
             constrain_bio=not args.no_constrained_bio,
         )
         result["_input"] = fn

onnx_inference.py CHANGED Viewed

@@ -59,10 +59,9 @@ def parse_with_onnx(
     model_dir: Path,
     onnx_path: Path,
     max_length: int,
-    use_rules: bool = False,
 ) -> Dict:
     parser = OnnxFilenameParser(model_dir, onnx_path, max_length)
-    return parser.parse(filename, use_rules=use_rules)
 class OnnxFilenameParser:
@@ -87,7 +86,7 @@ class OnnxFilenameParser:
             providers=providers or ["CPUExecutionProvider"],
         )
-    def parse(self, filename: str, use_rules: bool = False) -> Dict:
         tokens, input_ids, attention_mask, available = encode(filename, self.tokenizer, self.max_length)
         logits = self.session.run(
             ["logits"],
@@ -100,7 +99,7 @@ class OnnxFilenameParser:
         token_logits = torch.from_numpy(logits[0, 1:1 + available, :])
         label_ids = constrained_bio_decode(token_logits, self.id2label)
         labels = [self.id2label.get(label_id, "O") for label_id in label_ids]
-        result = postprocess(tokens, labels, tokenizer=self.tokenizer, filename=filename, use_rules=use_rules)
         result["_input"] = filename
         return result
@@ -111,8 +110,6 @@ def main() -> None:
     parser.add_argument("--model-dir", default=".", help="Directory containing vocab.json and config.json")
     parser.add_argument("--onnx", default="exports/anime_filename_parser.onnx", help="ONNX model path")
     parser.add_argument("--max-length", type=int, default=128, help="Static ONNX sequence length")
-    parser.add_argument("--rule-assist", action="store_true", help="Enable legacy structural postprocessing")
-    parser.add_argument("--no-rule-assist", action="store_true", help=argparse.SUPPRESS)
     args = parser.parse_args()
     result = parse_with_onnx(
@@ -120,7 +117,6 @@ def main() -> None:
         model_dir=Path(args.model_dir),
         onnx_path=Path(args.onnx),
         max_length=args.max_length,
-        use_rules=args.rule_assist and not args.no_rule_assist,
     )
     print(json.dumps(result, ensure_ascii=False))

     model_dir: Path,
     onnx_path: Path,
     max_length: int,
 ) -> Dict:
     parser = OnnxFilenameParser(model_dir, onnx_path, max_length)
+    return parser.parse(filename)
 class OnnxFilenameParser:
             providers=providers or ["CPUExecutionProvider"],
         )
+    def parse(self, filename: str) -> Dict:
         tokens, input_ids, attention_mask, available = encode(filename, self.tokenizer, self.max_length)
         logits = self.session.run(
             ["logits"],
         token_logits = torch.from_numpy(logits[0, 1:1 + available, :])
         label_ids = constrained_bio_decode(token_logits, self.id2label)
         labels = [self.id2label.get(label_id, "O") for label_id in label_ids]
+        result = postprocess(tokens, labels, tokenizer=self.tokenizer)
         result["_input"] = filename
         return result
     parser.add_argument("--model-dir", default=".", help="Directory containing vocab.json and config.json")
     parser.add_argument("--onnx", default="exports/anime_filename_parser.onnx", help="ONNX model path")
     parser.add_argument("--max-length", type=int, default=128, help="Static ONNX sequence length")
     args = parser.parse_args()
     result = parse_with_onnx(
         model_dir=Path(args.model_dir),
         onnx_path=Path(args.onnx),
         max_length=args.max_length,
     )
     print(json.dumps(result, ensure_ascii=False))

parse_eval_metrics.json CHANGED Viewed

@@ -2,7 +2,6 @@
   "primary_metric": "normalized_only",
   "modes": {
     "model_only": {
-      "use_rules": false,
       "constrain_bio": false,
       "sample_count": 1024,
       "field_accuracy": {
@@ -309,7 +308,6 @@
       ]
     },
     "normalized_only": {
-      "use_rules": false,
       "constrain_bio": true,
       "sample_count": 1024,
       "field_accuracy": {
@@ -533,627 +531,6 @@
           }
         }
       ]
-    },
-    "rule_assisted": {
-      "use_rules": true,
-      "constrain_bio": true,
-      "sample_count": 1024,
-      "field_accuracy": {
-        "group": 0.9873046875,
-        "title": 0.7265625,
-        "season": 0.9912109375,
-        "episode": 0.7021484375,
-        "resolution": 1.0,
-        "source": 0.98046875,
-        "special": 0.951171875
-      },
-      "field_correct": {
-        "group": 1011,
-        "title": 744,
-        "season": 1015,
-        "episode": 719,
-        "resolution": 1024,
-        "source": 1004,
-        "special": 974
-      },
-      "field_total": {
-        "group": 1024,
-        "title": 1024,
-        "season": 1024,
-        "episode": 1024,
-        "resolution": 1024,
-        "source": 1024,
-        "special": 1024
-      },
-      "full_match_accuracy": 0.5068359375,
-      "full_match_correct": 519,
-      "full_match_total": 1024,
-      "failures": [
-        {
-          "filename": "[DBD-Raws][Tokidoki Bosotto Russia-go de Dereru Tonari no Alya-san][PV][20][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "episode": {
-              "gold": null,
-              "pred": "20"
-            }
-          },
-          "gold": {
-            "group": "DBD-Raws",
-            "title": "Tokidoki Bosotto Russia-go de Dereru Tonari no Alya-san",
-            "season": null,
-            "episode": null,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "20"
-          },
-          "pred": {
-            "group": "DBD-Raws",
-            "title": "Tokidoki Bosotto Russia-go de Dereru Tonari no Alya-san",
-            "season": null,
-            "episode": 20,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "20"
-          }
-        },
-        {
-          "filename": "[DBD-Raws][我的英雄学院 第三季][PV][02][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "title": {
-              "gold": "我的英雄学院",
-              "pred": "我的英雄学院 第三季"
-            },
-            "episode": {
-              "gold": null,
-              "pred": "2"
-            }
-          },
-          "gold": {
-            "group": "DBD-Raws",
-            "title": "我的英雄学院",
-            "season": 3,
-            "episode": null,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "02"
-          },
-          "pred": {
-            "group": "DBD-Raws",
-            "title": "我的英雄学院 第三季",
-            "season": 3,
-            "episode": 2,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "02"
-          }
-        },
-        {
-          "filename": "[Moozzi2] Katanagatari [SP01] NCOP - 02 (BD 1920x1080 x.264 Flac)",
-          "errors": {
-            "episode": {
-              "gold": "1",
-              "pred": null
-            }
-          },
-          "gold": {
-            "group": "Moozzi2",
-            "title": "Katanagatari",
-            "season": null,
-            "episode": 1,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "NCOP - 02"
-          },
-          "pred": {
-            "group": "Moozzi2",
-            "title": "Katanagatari",
-            "season": null,
-            "episode": null,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "NCOP - 02"
-          }
-        },
-        {
-          "filename": "[DBD-Raws][Ijiranaide, Nagatoro-san 2nd Attack][PV][06][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "episode": {
-              "gold": null,
-              "pred": "6"
-            }
-          },
-          "gold": {
-            "group": "DBD-Raws",
-            "title": "Ijiranaide, Nagatoro-san 2nd Attack",
-            "season": null,
-            "episode": null,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "06"
-          },
-          "pred": {
-            "group": "DBD-Raws",
-            "title": "Ijiranaide, Nagatoro-san 2nd Attack",
-            "season": null,
-            "episode": 6,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "06"
-          }
-        },
-        {
-          "filename": "【枫叶字幕组】宠物小精灵XY&Z[第30(122)话][720P][MP4][GB_JP].mp4",
-          "errors": {
-            "title": {
-              "gold": "宠物小精灵xy&z",
-              "pred": "宠物小精灵xy&z[第30"
-            },
-            "episode": {
-              "gold": "30",
-              "pred": "122"
-            }
-          },
-          "gold": {
-            "group": "枫叶字幕组",
-            "title": "宠物小精灵XY&Z",
-            "season": null,
-            "episode": 30,
-            "resolution": "720P",
-            "source": "GB-JP",
-            "special": null
-          },
-          "pred": {
-            "group": "枫叶字幕组",
-            "title": "宠物小精灵XY&Z[第30",
-            "season": null,
-            "episode": 122,
-            "resolution": "720P",
-            "source": "GB-JP",
-            "special": null
-          }
-        },
-        {
-          "filename": "[Snow-Raws] グランベルム CM&PV10 (BD 1920x1080 HEVC-YUV420P10 FLAC)",
-          "errors": {
-            "title": {
-              "gold": "グランベルム",
-              "pred": "グランベルム cm&pv10"
-            }
-          },
-          "gold": {
-            "group": "Snow-Raws",
-            "title": "グランベルム",
-            "season": null,
-            "episode": null,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "PV10"
-          },
-          "pred": {
-            "group": "Snow-Raws",
-            "title": "グランベルム CM&PV10",
-            "season": null,
-            "episode": null,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "PV10"
-          }
-        },
-        {
-          "filename": "[Moozzi2] High School D×D New [SP02] NCED - 01 (BD 1920x1080 x.264 Flac)",
-          "errors": {
-            "episode": {
-              "gold": "2",
-              "pred": null
-            }
-          },
-          "gold": {
-            "group": "Moozzi2",
-            "title": "High School D×D New",
-            "season": null,
-            "episode": 2,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "NCED - 01"
-          },
-          "pred": {
-            "group": "Moozzi2",
-            "title": "High School D×D New",
-            "season": null,
-            "episode": null,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "NCED - 01"
-          }
-        },
-        {
-          "filename": "[SFEO-Raws] Koimonogatari - CM_01 (BD 720P x264 10bit AAC)[783E6EF2]",
-          "errors": {
-            "title": {
-              "gold": "koimonogatari",
-              "pred": "koimonogatari - cm_01"
-            }
-          },
-          "gold": {
-            "group": "SFEO-Raws",
-            "title": "Koimonogatari",
-            "season": null,
-            "episode": null,
-            "resolution": "720P",
-            "source": "BD",
-            "special": "CM_01"
-          },
-          "pred": {
-            "group": "SFEO-Raws",
-            "title": "Koimonogatari - CM_01",
-            "season": null,
-            "episode": null,
-            "resolution": "720P",
-            "source": "BD",
-            "special": "CM_01"
-          }
-        },
-        {
-          "filename": "[H720] Sangatsu no Lion CM01 (BD 1208x720 HEVC AAC)",
-          "errors": {
-            "group": {
-              "gold": null,
-              "pred": "h720"
-            },
-            "title": {
-              "gold": "h",
-              "pred": "sangatsu no lion"
-            },
-            "episode": {
-              "gold": "720",
-              "pred": null
-            },
-            "special": {
-              "gold": "cm",
-              "pred": "cm01"
-            }
-          },
-          "gold": {
-            "group": null,
-            "title": "H",
-            "season": null,
-            "episode": 720,
-            "resolution": "1208x720",
-            "source": "BD",
-            "special": "CM"
-          },
-          "pred": {
-            "group": "H720",
-            "title": "Sangatsu no Lion",
-            "season": null,
-            "episode": null,
-            "resolution": "1208x720",
-            "source": "BD",
-            "special": "CM01"
-          }
-        },
-        {
-          "filename": "[FZSD&DBD-Raws][King of Prism Dramatic Prism.1][PV][08][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "episode": {
-              "gold": null,
-              "pred": "8"
-            }
-          },
-          "gold": {
-            "group": "FZSD&DBD-Raws",
-            "title": "King of Prism Dramatic Prism.1",
-            "season": null,
-            "episode": null,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "08"
-          },
-          "pred": {
-            "group": "FZSD&DBD-Raws",
-            "title": "King of Prism Dramatic Prism.1",
-            "season": null,
-            "episode": 8,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "08"
-          }
-        },
-        {
-          "filename": "Robin Hood no Daibouken 49",
-          "errors": {
-            "episode": {
-              "gold": null,
-              "pred": "49"
-            }
-          },
-          "gold": {
-            "group": null,
-            "title": "Robin Hood no Daibouken 49",
-            "season": null,
-            "episode": null,
-            "resolution": null,
-            "source": null,
-            "special": null
-          },
-          "pred": {
-            "group": null,
-            "title": "Robin Hood no Daibouken 49",
-            "season": null,
-            "episode": 49,
-            "resolution": null,
-            "source": null,
-            "special": null
-          }
-        },
-        {
-          "filename": "[Moozzi2] Paniponi Dash! [SP02] NCED - 07 [ EP.07 ] (BD 1920x1080 x.264 Flac)",
-          "errors": {
-            "episode": {
-              "gold": "2",
-              "pred": null
-            }
-          },
-          "gold": {
-            "group": "Moozzi2",
-            "title": "Paniponi Dash!",
-            "season": null,
-            "episode": 2,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "NCED - 07"
-          },
-          "pred": {
-            "group": "Moozzi2",
-            "title": "Paniponi Dash!",
-            "season": null,
-            "episode": null,
-            "resolution": "1920x1080",
-            "source": "BD",
-            "special": "NCED - 07"
-          }
-        },
-        {
-          "filename": "[Moozzi2] Onegai My Melody [SP10] Kuromi Naration TV-CM - 01 [ 30Sec. ] (BD 1024x768 x.264 AAC)",
-          "errors": {
-            "title": {
-              "gold": "onegai my melody",
-              "pred": "onegai my melody [sp10] kuromi naration tv-cm"
-            },
-            "episode": {
-              "gold": "10",
-              "pred": "1"
-            }
-          },
-          "gold": {
-            "group": "Moozzi2",
-            "title": "Onegai My Melody",
-            "season": null,
-            "episode": 10,
-            "resolution": "1024x768",
-            "source": "BD",
-            "special": "CM - 01"
-          },
-          "pred": {
-            "group": "Moozzi2",
-            "title": "Onegai My Melody [SP10] Kuromi Naration TV-CM",
-            "season": null,
-            "episode": 1,
-            "resolution": "1024x768",
-            "source": "BD",
-            "special": "CM - 01"
-          }
-        },
-        {
-          "filename": "[DBD-Raws][Kuzu no Honkai][PV][02][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "episode": {
-              "gold": null,
-              "pred": "2"
-            }
-          },
-          "gold": {
-            "group": "DBD-Raws",
-            "title": "Kuzu no Honkai",
-            "season": null,
-            "episode": null,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "02"
-          },
-          "pred": {
-            "group": "DBD-Raws",
-            "title": "Kuzu no Honkai",
-            "season": null,
-            "episode": 2,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "02"
-          }
-        },
-        {
-          "filename": "[DBD-Raws][One Piece Wano Arc][Soushuuhen][03][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "title": {
-              "gold": "one piece wano arc soushuuhen",
-              "pred": "one piece wano arc"
-            }
-          },
-          "gold": {
-            "group": "DBD-Raws",
-            "title": "One Piece Wano Arc Soushuuhen",
-            "season": null,
-            "episode": 3,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": null
-          },
-          "pred": {
-            "group": "DBD-Raws",
-            "title": "One Piece Wano Arc",
-            "season": null,
-            "episode": 3,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": null
-          }
-        },
-        {
-          "filename": "[LAC][Gintama][196][GB][R10]",
-          "errors": {
-            "group": {
-              "gold": null,
-              "pred": "lac"
-            },
-            "title": {
-              "gold": "lac gintama 196 gb r",
-              "pred": "gintama"
-            },
-            "episode": {
-              "gold": "10",
-              "pred": "196"
-            },
-            "source": {
-              "gold": null,
-              "pred": "gb"
-            }
-          },
-          "gold": {
-            "group": null,
-            "title": "LAC Gintama 196 GB R",
-            "season": null,
-            "episode": 10,
-            "resolution": null,
-            "source": null,
-            "special": null
-          },
-          "pred": {
-            "group": "LAC",
-            "title": "Gintama",
-            "season": null,
-            "episode": 196,
-            "resolution": null,
-            "source": "GB",
-            "special": null
-          }
-        },
-        {
-          "filename": "[DBD-Raws][Date a Live][Director's Cut][PV][07][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "title": {
-              "gold": "date a live director's cut",
-              "pred": "date a live"
-            },
-            "episode": {
-              "gold": null,
-              "pred": "7"
-            }
-          },
-          "gold": {
-            "group": "DBD-Raws",
-            "title": "Date a Live Director's Cut",
-            "season": null,
-            "episode": null,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "07"
-          },
-          "pred": {
-            "group": "DBD-Raws",
-            "title": "Date a Live",
-            "season": null,
-            "episode": 7,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "07"
-          }
-        },
-        {
-          "filename": "[DBD-Raws][Nageki no Bourei wa Intai Shitai][PV][09][1080P][BDRip][HEVC-10bit][FLAC]",
-          "errors": {
-            "episode": {
-              "gold": null,
-              "pred": "9"
-            }
-          },
-          "gold": {
-            "group": "DBD-Raws",
-            "title": "Nageki no Bourei wa Intai Shitai",
-            "season": null,
-            "episode": null,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "09"
-          },
-          "pred": {
-            "group": "DBD-Raws",
-            "title": "Nageki no Bourei wa Intai Shitai",
-            "season": null,
-            "episode": 9,
-            "resolution": "1080P",
-            "source": "BDRip",
-            "special": "09"
-          }
-        },
-        {
-          "filename": "[RUELL-Next] Fruits Basket NCOP 1 (DVD 768x576 x264 AC3 384K) [FF1CA8EF]",
-          "errors": {
-            "title": {
-              "gold": "fruits basket",
-              "pred": "fruits basket ncop 1"
-            },
-            "special": {
-              "gold": "ncop 1",
-              "pred": "ncop1"
-            }
-          },
-          "gold": {
-            "group": "RUELL-Next",
-            "title": "Fruits Basket",
-            "season": null,
-            "episode": null,
-            "resolution": "768x576",
-            "source": "DVD",
-            "special": "NCOP 1"
-          },
-          "pred": {
-            "group": "RUELL-Next",
-            "title": "Fruits Basket NCOP 1",
-            "season": null,
-            "episode": null,
-            "resolution": "768x576",
-            "source": "DVD",
-            "special": "NCOP1"
-          }
-        },
-        {
-          "filename": "[アニメ DVD] ミスター味っ子 第69話 「島巡り磯鍋競争！７包丁人・大石老師登場」 (640x480 WMV9)",
-          "errors": {
-            "source": {
-              "gold": null,
-              "pred": "dvd"
-            }
-          },
-          "gold": {
-            "group": "アニメ DVD",
-            "title": "ミスター味っ子",
-            "season": null,
-            "episode": 69,
-            "resolution": "640x480",
-            "source": null,
-            "special": null
-          },
-          "pred": {
-            "group": "アニメ DVD",
-            "title": "ミスター味っ子",
-            "season": null,
-            "episode": 69,
-            "resolution": "640x480",
-            "source": "DVD",
-            "special": null
-          }
-        }
-      ]
     }
   }
-}

   "primary_metric": "normalized_only",
   "modes": {
     "model_only": {
       "constrain_bio": false,
       "sample_count": 1024,
       "field_accuracy": {
       ]
     },
     "normalized_only": {
       "constrain_bio": true,
       "sample_count": 1024,
       "field_accuracy": {
           }
         }
       ]
     }
   }
+}

train.py CHANGED Viewed

@@ -230,7 +230,6 @@ def parse_exact_metrics(
     id2label: Dict[int, str],
     max_length: int,
     limit: Optional[int],
-    use_rules: bool = False,
     constrain_bio: bool = True,
 ) -> Dict:
     """Evaluate end-to-end field exact match on filenames, not just token loss."""
@@ -249,7 +248,7 @@ def parse_exact_metrics(
         available = max(0, max_length - 2)
         tokens = tokens[:available]
         gold_labels = gold_labels[:available]
-        gold = postprocess(tokens, gold_labels, tokenizer=tokenizer, filename=filename, use_rules=False)
         gold_entities = {label.split("-", 1)[1] for label in gold_labels if label.startswith(("B-", "I-"))}
         for optional_field, entity in (("episode", "EPISODE"), ("season", "SEASON")):
             if entity not in gold_entities:
@@ -261,7 +260,6 @@ def parse_exact_metrics(
             id2label,
             max_length=max_length,
             debug=False,
-            use_rules=use_rules,
             constrain_bio=constrain_bio,
         )
@@ -298,7 +296,6 @@ def parse_exact_metrics(
     total = counter.get("full_total", 0)
     correct = counter.get("full_correct", 0)
     return {
-        "use_rules": use_rules,
         "constrain_bio": constrain_bio,
         "sample_count": total,
         "field_accuracy": field_accuracy,
@@ -320,9 +317,8 @@ def parse_exact_metrics_all_modes(
     limit: Optional[int],
 ) -> Dict:
     modes = {
-        "model_only": {"use_rules": False, "constrain_bio": False},
-        "normalized_only": {"use_rules": False, "constrain_bio": True},
-        "rule_assisted": {"use_rules": True, "constrain_bio": True},
     }
     return {
         "primary_metric": "normalized_only",
@@ -334,7 +330,6 @@ def parse_exact_metrics_all_modes(
                 id2label,
                 max_length,
                 limit,
-                use_rules=settings["use_rules"],
                 constrain_bio=settings["constrain_bio"],
             )
             for name, settings in modes.items()

     id2label: Dict[int, str],
     max_length: int,
     limit: Optional[int],
     constrain_bio: bool = True,
 ) -> Dict:
     """Evaluate end-to-end field exact match on filenames, not just token loss."""
         available = max(0, max_length - 2)
         tokens = tokens[:available]
         gold_labels = gold_labels[:available]
+        gold = postprocess(tokens, gold_labels, tokenizer=tokenizer)
         gold_entities = {label.split("-", 1)[1] for label in gold_labels if label.startswith(("B-", "I-"))}
         for optional_field, entity in (("episode", "EPISODE"), ("season", "SEASON")):
             if entity not in gold_entities:
             id2label,
             max_length=max_length,
             debug=False,
             constrain_bio=constrain_bio,
         )
     total = counter.get("full_total", 0)
     correct = counter.get("full_correct", 0)
     return {
         "constrain_bio": constrain_bio,
         "sample_count": total,
         "field_accuracy": field_accuracy,
     limit: Optional[int],
 ) -> Dict:
     modes = {
+        "model_only": {"constrain_bio": False},
+        "normalized_only": {"constrain_bio": True},
     }
     return {
         "primary_metric": "normalized_only",
                 id2label,
                 max_length,
                 limit,
                 constrain_bio=settings["constrain_bio"],
             )
             for name, settings in modes.items()