ModerRAS commited on
Commit
adf92db
·
1 Parent(s): e63569d

Fix GM-Team bilingual title parsing

Browse files
MAINTENANCE.md CHANGED
@@ -50,7 +50,7 @@ uv run python train.py \
50
  --tokenizer char \
51
  --data-file datasets/AnimeName/dmhy_weak_char.jsonl \
52
  --vocab-file datasets/AnimeName/vocab.char.json \
53
- --save-dir checkpoints/dmhy-char-full-relabel \
54
  --init-model-dir . \
55
  --epochs 2 \
56
  --batch-size 256 \
@@ -59,7 +59,7 @@ uv run python train.py \
59
  --max-seq-length 128 \
60
  --checkpoint-steps 1000 \
61
  --parse-eval-limit 2048 \
62
- --seed 48
63
  ```
64
 
65
  ## Publish a New Checkpoint
@@ -67,15 +67,15 @@ uv run python train.py \
67
  Copy the final checkpoint to the repository root:
68
 
69
  ```powershell
70
- Copy-Item checkpoints/dmhy-char-full-relabel/final/config.json . -Force
71
- Copy-Item checkpoints/dmhy-char-full-relabel/final/model.safetensors . -Force
72
- Copy-Item checkpoints/dmhy-char-full-relabel/final/tokenizer_config.json . -Force
73
- Copy-Item checkpoints/dmhy-char-full-relabel/final/training_args.bin . -Force
74
- Copy-Item checkpoints/dmhy-char-full-relabel/final/vocab.json . -Force
75
  Copy-Item datasets/AnimeName/vocab.char.json .\vocab.char.json -Force
76
- Copy-Item checkpoints/dmhy-char-full-relabel/final/run_metadata.json . -Force
77
- Copy-Item checkpoints/dmhy-char-full-relabel/final/trainer_eval_metrics.json . -Force
78
- Copy-Item checkpoints/dmhy-char-full-relabel/final/parse_eval_metrics.json . -Force
79
  ```
80
 
81
  There is no tracked `model/` duplicate. The root checkpoint is the publishing
 
50
  --tokenizer char \
51
  --data-file datasets/AnimeName/dmhy_weak_char.jsonl \
52
  --vocab-file datasets/AnimeName/vocab.char.json \
53
+ --save-dir checkpoints/dmhy-char-guoman-relabel \
54
  --init-model-dir . \
55
  --epochs 2 \
56
  --batch-size 256 \
 
59
  --max-seq-length 128 \
60
  --checkpoint-steps 1000 \
61
  --parse-eval-limit 2048 \
62
+ --seed 52
63
  ```
64
 
65
  ## Publish a New Checkpoint
 
67
  Copy the final checkpoint to the repository root:
68
 
69
  ```powershell
70
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/config.json . -Force
71
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/model.safetensors . -Force
72
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/tokenizer_config.json . -Force
73
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/training_args.bin . -Force
74
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/vocab.json . -Force
75
  Copy-Item datasets/AnimeName/vocab.char.json .\vocab.char.json -Force
76
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/run_metadata.json . -Force
77
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/trainer_eval_metrics.json . -Force
78
+ Copy-Item checkpoints/dmhy-char-guoman-relabel/final/parse_eval_metrics.json . -Force
79
  ```
80
 
81
  There is no tracked `model/` duplicate. The root checkpoint is the publishing
README.md CHANGED
@@ -59,21 +59,21 @@ dataset relabeling and diagnostics, but the root checkpoint loads as `char`.
59
  ## Evaluation
60
 
61
  Final full-relabel char training (`632002` DMHY rows, 2 epochs, batch size 256,
62
- seed 48):
63
 
64
  | Metric | Value |
65
  |--------|-------|
66
- | Eval loss | 0.0163 |
67
- | Entity precision | 0.9800 |
68
- | Entity recall | 0.9867 |
69
- | Entity F1 | 0.9833 |
70
- | Token accuracy | 0.9943 |
71
- | Held-out parse full match | 2008/2048 (0.9805) |
72
- | Fixed regression full match | 21/21 (1.0000) |
73
 
74
  The fixed regression set includes second-season aliases such as `Ni`,
75
- `Ni no Sara`, `貳`, and `弐ノ章`, plus long-running episode IDs and dense meta
76
- blocks.
77
 
78
  ## Usage
79
 
@@ -121,13 +121,13 @@ uv run python convert_to_char_dataset.py \
121
  uv run python train.py --tokenizer char \
122
  --data-file datasets/AnimeName/dmhy_weak_char.jsonl \
123
  --vocab-file datasets/AnimeName/vocab.char.json \
124
- --save-dir checkpoints/dmhy-char-full-relabel \
125
  --init-model-dir . \
126
  --epochs 2 --batch-size 256 \
127
  --learning-rate 0.00008 --warmup-steps 300 \
128
  --checkpoint-steps 1000 --save-total-limit 3 \
129
  --parse-eval-limit 2048 \
130
- --max-seq-length 128 --seed 48
131
  ```
132
 
133
  The converter keeps source metadata and adds `tokenizer_variant`, source token
 
59
  ## Evaluation
60
 
61
  Final full-relabel char training (`632002` DMHY rows, 2 epochs, batch size 256,
62
+ seed 52):
63
 
64
  | Metric | Value |
65
  |--------|-------|
66
+ | Eval loss | 0.0058 |
67
+ | Entity precision | 0.9922 |
68
+ | Entity recall | 0.9946 |
69
+ | Entity F1 | 0.9934 |
70
+ | Token accuracy | 0.9981 |
71
+ | Held-out parse full match | 2029/2048 (0.9907) |
72
+ | Fixed regression full match | 22/22 (1.0000) |
73
 
74
  The fixed regression set includes second-season aliases such as `Ni`,
75
+ `Ni no Sara`, `貳`, and `弐ノ章`, plus GM-Team bilingual Chinese animation
76
+ bracket layouts, long-running episode IDs, and dense meta blocks.
77
 
78
  ## Usage
79
 
 
121
  uv run python train.py --tokenizer char \
122
  --data-file datasets/AnimeName/dmhy_weak_char.jsonl \
123
  --vocab-file datasets/AnimeName/vocab.char.json \
124
+ --save-dir checkpoints/dmhy-char-guoman-relabel \
125
  --init-model-dir . \
126
  --epochs 2 --batch-size 256 \
127
  --learning-rate 0.00008 --warmup-steps 300 \
128
  --checkpoint-steps 1000 --save-total-limit 3 \
129
  --parse-eval-limit 2048 \
130
+ --max-seq-length 128 --seed 52
131
  ```
132
 
133
  The converter keeps source metadata and adds `tokenizer_variant`, source token
build_repair_focus_dataset.py CHANGED
@@ -88,6 +88,42 @@ def manual_cases() -> Iterable[dict]:
88
  ("FLAC", "SOURCE"),
89
  ],
90
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
 
93
  def main() -> None:
 
88
  ("FLAC", "SOURCE"),
89
  ],
90
  )
91
+ yield char_item(
92
+ "[GM-Team][国漫][逆天邪神 第2季][Against the Gods Ⅱ][2026][04][HEVC][GB][4K].mp4",
93
+ [
94
+ ("GM-Team", "GROUP"),
95
+ ("逆天邪神", "TITLE"),
96
+ ("第2季", "SEASON"),
97
+ ("04", "EPISODE"),
98
+ ("HEVC", "SOURCE"),
99
+ ("GB", "SOURCE"),
100
+ ("4K", "RESOLUTION"),
101
+ ],
102
+ )
103
+ yield char_item(
104
+ "[GM-Team][国漫][剑来 第2季][Sword of Coming Ⅱ][2025][04][HEVC][GB][4K]",
105
+ [
106
+ ("GM-Team", "GROUP"),
107
+ ("剑来", "TITLE"),
108
+ ("第2季", "SEASON"),
109
+ ("04", "EPISODE"),
110
+ ("HEVC", "SOURCE"),
111
+ ("GB", "SOURCE"),
112
+ ("4K", "RESOLUTION"),
113
+ ],
114
+ )
115
+ yield char_item(
116
+ "[GM-Team][国漫][大主宰 第2季][The Great Ruler Ⅱ][2026][04][HEVC][GB][4K]",
117
+ [
118
+ ("GM-Team", "GROUP"),
119
+ ("大主宰", "TITLE"),
120
+ ("第2季", "SEASON"),
121
+ ("04", "EPISODE"),
122
+ ("HEVC", "SOURCE"),
123
+ ("GB", "SOURCE"),
124
+ ("4K", "RESOLUTION"),
125
+ ],
126
+ )
127
 
128
 
129
  def main() -> None:
case_metrics.json CHANGED
@@ -1,29 +1,29 @@
1
  {
2
  "model_dir": ".",
3
- "case_file": "data\\parser_regression_cases.json",
4
  "tokenizer_variant": "char",
5
  "max_length": 128,
6
  "use_rules": true,
7
  "constrain_bio": true,
8
- "case_count": 21,
9
- "full_correct": 21,
10
  "full_accuracy": 1.0,
11
  "field_correct": {
12
- "group": 18,
13
- "title": 21,
14
- "episode": 21,
15
- "resolution": 21,
16
- "source": 14,
17
- "season": 8,
18
  "special": 1
19
  },
20
  "field_total": {
21
- "group": 18,
22
- "title": 21,
23
- "episode": 21,
24
- "resolution": 21,
25
- "source": 14,
26
- "season": 8,
27
  "special": 1
28
  },
29
  "field_accuracy": {
@@ -454,6 +454,28 @@
454
  "season": 2,
455
  "title": "炎炎の消防隊"
456
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
457
  }
458
  ]
459
  }
 
1
  {
2
  "model_dir": ".",
3
+ "case_file": "data/parser_regression_cases.json",
4
  "tokenizer_variant": "char",
5
  "max_length": 128,
6
  "use_rules": true,
7
  "constrain_bio": true,
8
+ "case_count": 22,
9
+ "full_correct": 22,
10
  "full_accuracy": 1.0,
11
  "field_correct": {
12
+ "group": 19,
13
+ "title": 22,
14
+ "episode": 22,
15
+ "resolution": 22,
16
+ "source": 15,
17
+ "season": 9,
18
  "special": 1
19
  },
20
  "field_total": {
21
+ "group": 19,
22
+ "title": 22,
23
+ "episode": 22,
24
+ "resolution": 22,
25
+ "source": 15,
26
+ "season": 9,
27
  "special": 1
28
  },
29
  "field_accuracy": {
 
454
  "season": 2,
455
  "title": "炎炎の消防隊"
456
  }
457
+ },
458
+ {
459
+ "id": "gm_team_guoman_bilingual_s2",
460
+ "filename": "[GM-Team][国漫][逆天邪神 第2季][Against the Gods Ⅱ][2026][04][HEVC][GB][4K].mp4",
461
+ "ok": true,
462
+ "errors": {},
463
+ "expected": {
464
+ "group": "GM-Team",
465
+ "title": "逆天邪神",
466
+ "season": 2,
467
+ "episode": 4,
468
+ "resolution": "4K",
469
+ "source": "GB"
470
+ },
471
+ "pred": {
472
+ "episode": 4,
473
+ "group": "GM-Team",
474
+ "resolution": "4K",
475
+ "season": 2,
476
+ "source": "GB",
477
+ "title": "逆天邪神"
478
+ }
479
  }
480
  ]
481
  }
data/parser_regression_cases.json CHANGED
@@ -228,5 +228,17 @@
228
  "episode": 13,
229
  "resolution": "1920x1080"
230
  }
 
 
 
 
 
 
 
 
 
 
 
 
231
  }
232
  ]
 
228
  "episode": 13,
229
  "resolution": "1920x1080"
230
  }
231
+ },
232
+ {
233
+ "id": "gm_team_guoman_bilingual_s2",
234
+ "filename": "[GM-Team][国漫][逆天邪神 第2季][Against the Gods Ⅱ][2026][04][HEVC][GB][4K].mp4",
235
+ "expected": {
236
+ "group": "GM-Team",
237
+ "title": "逆天邪神",
238
+ "season": 2,
239
+ "episode": 4,
240
+ "resolution": "4K",
241
+ "source": "GB"
242
+ }
243
  }
244
  ]
datasets/AnimeName CHANGED
@@ -1 +1 @@
1
- Subproject commit 8d2b6c9e639fde6be0e428e5f34f56fccd5aa2ea
 
1
+ Subproject commit 004a8c08628b6820fb2d1b59a80fdcfe925ef095
dmhy_dataset.py CHANGED
@@ -35,6 +35,10 @@ NOISE_BRACKETS = {
35
  "tc", "sc", "gb", "big5", "cht", "chs", "jpn", "jp", "jap", "eng",
36
  "繁中", "简中", "繁日", "简日", "日语", "日文", "外挂", "内封", "字幕",
37
  }
 
 
 
 
38
 
39
  SPECIAL_RE = re.compile(r"^(?:ova\d*|oad\d*|sp\d*|movie|the\s*movie|op|ed|pv|cm|ncop|nced|剧场版|劇場版|特别篇|特別篇)$", re.I)
40
  SPECIAL_SEARCH_RE = re.compile(r"^(?:檢索|检索|搜索|搜寻|搜尋|别名|別名|alias|search|keyword)\s*[::].+", re.I)
@@ -186,7 +190,8 @@ def is_source(token: str) -> bool:
186
  return True
187
  if has_wrapping_brackets(token):
188
  parts = [part for part in re.split(r"[\s&+/,._-]+", clean) if part]
189
- return bool(parts) and all(SOURCE_RE.match(part) or is_noise_bracket(part) for part in parts)
 
190
  return False
191
 
192
 
@@ -195,6 +200,11 @@ def is_special(token: str) -> bool:
195
  return bool(SPECIAL_RE.match(clean) or SPECIAL_SEARCH_RE.match(clean))
196
 
197
 
 
 
 
 
 
198
  def is_noise_bracket(token: str) -> bool:
199
  clean = clean_bracket(token)
200
  if not clean:
@@ -202,6 +212,8 @@ def is_noise_bracket(token: str) -> bool:
202
  normalized = re.sub(r"[\s._-]+", "", clean).lower()
203
  if normalized in NOISE_BRACKETS:
204
  return True
 
 
205
  if DATE_RE.match(clean) or HASH_RE.match(clean):
206
  return True
207
  return False
@@ -335,6 +347,42 @@ def label_context_season_tokens(
335
  categories[idx] = "season"
336
 
337
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
338
  def embedded_bracket_episode(token: str) -> Optional[tuple[str, str, str]]:
339
  """Split malformed tokens such as '[Group}Title[658]' into title + episode."""
340
  if episode_number(token) is not None:
@@ -390,6 +438,10 @@ def finalize_weak_sample(
390
  continue
391
  if is_explicit_season(token):
392
  expanded_categories[idx] = "season"
 
 
 
 
393
 
394
  labels = assign_iob2(expanded_categories)
395
  if len(expanded_tokens) != len(labels):
@@ -699,7 +751,9 @@ def weak_label_filename(filename: str, tokenizer: AnimeTokenizer) -> Optional[di
699
  for idx, token in enumerate(tokens):
700
  if categories[idx] == "group":
701
  continue
702
- if is_resolution(token):
 
 
703
  categories[idx] = "resolution"
704
  elif is_source(token):
705
  categories[idx] = "source"
@@ -715,6 +769,7 @@ def weak_label_filename(filename: str, tokenizer: AnimeTokenizer) -> Optional[di
715
  return fallback_embedded_episode_sample(tokens, tokenizer) or fallback_no_episode_sample(tokens, tokenizer)
716
  categories[episode_idx] = "episode"
717
  label_context_season_tokens(tokens, categories, episode_idx)
 
718
 
719
  # S01E07 is tokenized as S01 + E07 after tokenizer changes. If an older
720
  # token slips through, expand_tokens_and_categories will split it.
 
35
  "tc", "sc", "gb", "big5", "cht", "chs", "jpn", "jp", "jap", "eng",
36
  "繁中", "简中", "繁日", "简日", "日语", "日文", "外挂", "内封", "字幕",
37
  }
38
+ CATEGORY_BRACKETS = {
39
+ "国漫", "國漫", "国产", "國產", "国产动漫", "國產動漫", "国产动画", "國產動畫",
40
+ "国创", "國創", "中国动漫", "中國動漫", "中国动画", "中國動畫",
41
+ }
42
 
43
  SPECIAL_RE = re.compile(r"^(?:ova\d*|oad\d*|sp\d*|movie|the\s*movie|op|ed|pv|cm|ncop|nced|剧场版|劇場版|特别篇|特別篇)$", re.I)
44
  SPECIAL_SEARCH_RE = re.compile(r"^(?:檢索|检索|搜索|搜寻|搜尋|别名|別名|alias|search|keyword)\s*[::].+", re.I)
 
190
  return True
191
  if has_wrapping_brackets(token):
192
  parts = [part for part in re.split(r"[\s&+/,._-]+", clean) if part]
193
+ has_source_part = any(SOURCE_RE.match(part) for part in parts)
194
+ return has_source_part and all(SOURCE_RE.match(part) or is_noise_bracket(part) for part in parts)
195
  return False
196
 
197
 
 
200
  return bool(SPECIAL_RE.match(clean) or SPECIAL_SEARCH_RE.match(clean))
201
 
202
 
203
+ def is_category_bracket(token: str) -> bool:
204
+ clean = re.sub(r"[\s._-]+", "", clean_bracket(token))
205
+ return has_wrapping_brackets(token) and clean in CATEGORY_BRACKETS
206
+
207
+
208
  def is_noise_bracket(token: str) -> bool:
209
  clean = clean_bracket(token)
210
  if not clean:
 
212
  normalized = re.sub(r"[\s._-]+", "", clean).lower()
213
  if normalized in NOISE_BRACKETS:
214
  return True
215
+ if is_category_bracket(token):
216
+ return True
217
  if DATE_RE.match(clean) or HASH_RE.match(clean):
218
  return True
219
  return False
 
347
  categories[idx] = "season"
348
 
349
 
350
+ def repair_structured_bracket_title_aliases(
351
+ tokens: Sequence[str],
352
+ categories: List[str],
353
+ episode_idx: int,
354
+ ) -> None:
355
+ """Keep the primary title in category-prefixed bracket series.
356
+
357
+ GM-Team-style rows often look like:
358
+ [GROUP][国漫][中文标题 第2季][English Alias Ⅱ][2026][04][meta]
359
+ The category, alias, and year brackets are metadata for parsing purposes;
360
+ the first real title bracket after the category is the canonical title.
361
+ """
362
+ if not any(is_category_bracket(tokens[idx]) for idx in range(min(episode_idx, len(tokens)))):
363
+ return
364
+
365
+ title_candidates = [
366
+ idx
367
+ for idx in range(episode_idx)
368
+ if categories[idx] == "title"
369
+ and has_wrapping_brackets(tokens[idx])
370
+ and is_title_token(tokens[idx])
371
+ ]
372
+ if not title_candidates:
373
+ return
374
+
375
+ primary_idx = title_candidates[0]
376
+ for idx in title_candidates[1:]:
377
+ categories[idx] = "sep"
378
+
379
+ for idx in range(episode_idx):
380
+ if idx == primary_idx:
381
+ continue
382
+ if is_category_bracket(tokens[idx]) or DATE_RE.match(clean_bracket(tokens[idx])):
383
+ categories[idx] = "sep"
384
+
385
+
386
  def embedded_bracket_episode(token: str) -> Optional[tuple[str, str, str]]:
387
  """Split malformed tokens such as '[Group}Title[658]' into title + episode."""
388
  if episode_number(token) is not None:
 
438
  continue
439
  if is_explicit_season(token):
440
  expanded_categories[idx] = "season"
441
+ prev_idx = idx - 1
442
+ while prev_idx >= 0 and is_separator_token(expanded_tokens[prev_idx]) and expanded_categories[prev_idx] == "title":
443
+ expanded_categories[prev_idx] = "sep"
444
+ prev_idx -= 1
445
 
446
  labels = assign_iob2(expanded_categories)
447
  if len(expanded_tokens) != len(labels):
 
751
  for idx, token in enumerate(tokens):
752
  if categories[idx] == "group":
753
  continue
754
+ if is_category_bracket(token):
755
+ categories[idx] = "sep"
756
+ elif is_resolution(token):
757
  categories[idx] = "resolution"
758
  elif is_source(token):
759
  categories[idx] = "source"
 
769
  return fallback_embedded_episode_sample(tokens, tokenizer) or fallback_no_episode_sample(tokens, tokenizer)
770
  categories[episode_idx] = "episode"
771
  label_context_season_tokens(tokens, categories, episode_idx)
772
+ repair_structured_bracket_title_aliases(tokens, categories, episode_idx)
773
 
774
  # S01E07 is tokenized as S01 + E07 after tokenizer changes. If an older
775
  # token slips through, expand_tokens_and_categories will split it.
exports/anime_filename_parser.metadata.json CHANGED
@@ -8,5 +8,5 @@
8
  128,
9
  15
10
  ],
11
- "max_abs_diff": 3.3855438232421875e-05
12
  }
 
8
  128,
9
  15
10
  ],
11
+ "max_abs_diff": 5.65648078918457e-05
12
  }
exports/anime_filename_parser.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f9b874fbd4217a190487f512dcc6dd7ce2f0e610147703ca0cddcc0db44fb1c7
3
  size 19633926
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d967c5c2305e6737c9e791956a174655deebef2cfa477e081890ebddd56e004
3
  size 19633926
inference.py CHANGED
@@ -330,6 +330,11 @@ NOISE_META_RE = re.compile(
330
  r"Opus|ASS.*|CHS|CHT|BIG5|GB|JPN?|MP4|MKV|繁中|简中|内封|外挂)$",
331
  re.I,
332
  )
 
 
 
 
 
333
 
334
 
335
  def cn_number_to_int(text: str) -> Optional[int]:
@@ -372,8 +377,11 @@ def looks_like_episode_or_meta(text: str) -> bool:
372
  if not text:
373
  return False
374
  clean = text.strip()
 
375
  return bool(
376
  re.fullmatch(r"(?:EP?|#)?\d{1,4}(?:v\d+)?", clean, re.I)
 
 
377
  or RESOLUTION_RE.search(clean)
378
  or SOURCE_TAG_RE.fullmatch(clean)
379
  or SOURCE_RE.search(clean)
@@ -492,6 +500,10 @@ def apply_rule_assists(filename: str, result: Dict) -> Dict:
492
  if repaired_title:
493
  repaired["title"] = repaired_title
494
 
 
 
 
 
495
  if repaired.get("title") and repaired.get("season") is not None:
496
  repaired["title"] = strip_trailing_season_from_title(repaired["title"], repaired["season"])
497
 
@@ -584,6 +596,56 @@ def source_candidates(filename: str) -> List[str]:
584
  return [value for _priority, _neg_start, value in sorted(deduped.values(), reverse=True)]
585
 
586
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
587
  def best_structural_episode(filename: str) -> Optional[int]:
588
  priorities = {
589
  "season_episode": 1000,
@@ -635,6 +697,7 @@ def strip_trailing_season_from_title(title: str, season: int) -> str:
635
  rf"\s+[Ss]0*{season_text}$",
636
  rf"\s+Season\s*0*{season_text}$",
637
  rf"\s+0*{season_text}$",
 
638
  ]
639
  cleaned = title
640
  for pattern in patterns:
 
330
  r"Opus|ASS.*|CHS|CHT|BIG5|GB|JPN?|MP4|MKV|繁中|简中|内封|外挂)$",
331
  re.I,
332
  )
333
+ DATE_RE = re.compile(r"^(?:19|20)\d{2}(?:[.\-_年]?(?:0?[1-9]|1[0-2]))?(?:[.\-_月]?(?:0?[1-9]|[12]\d|3[01]))?日?$")
334
+ CATEGORY_BRACKETS = {
335
+ "国漫", "國漫", "国产", "國產", "国产动漫", "國產動漫", "国产动画", "國產動畫",
336
+ "国创", "國創", "中国动漫", "中國動漫", "中国动画", "中國動畫",
337
+ }
338
 
339
 
340
  def cn_number_to_int(text: str) -> Optional[int]:
 
377
  if not text:
378
  return False
379
  clean = text.strip()
380
+ normalized = re.sub(r"[\s._-]+", "", clean)
381
  return bool(
382
  re.fullmatch(r"(?:EP?|#)?\d{1,4}(?:v\d+)?", clean, re.I)
383
+ or DATE_RE.fullmatch(clean)
384
+ or normalized in CATEGORY_BRACKETS
385
  or RESOLUTION_RE.search(clean)
386
  or SOURCE_TAG_RE.fullmatch(clean)
387
  or SOURCE_RE.search(clean)
 
500
  if repaired_title:
501
  repaired["title"] = repaired_title
502
 
503
+ structured_title = infer_structured_bracket_title(filename, group, repaired.get("episode"))
504
+ if structured_title:
505
+ repaired["title"] = structured_title
506
+
507
  if repaired.get("title") and repaired.get("season") is not None:
508
  repaired["title"] = strip_trailing_season_from_title(repaired["title"], repaired["season"])
509
 
 
596
  return [value for _priority, _neg_start, value in sorted(deduped.values(), reverse=True)]
597
 
598
 
599
+ def is_category_text(text: str) -> bool:
600
+ normalized = re.sub(r"[\s._-]+", "", text.strip())
601
+ return normalized in CATEGORY_BRACKETS
602
+
603
+
604
+ def infer_structured_bracket_title(
605
+ filename: str,
606
+ group: Optional[str],
607
+ episode: Optional[int],
608
+ ) -> Optional[str]:
609
+ """Pick the primary title from [group][category][title][alias][year][episode] rows."""
610
+ brackets = bracket_parts(filename)
611
+ if len(brackets) < 4 or episode is None:
612
+ return None
613
+
614
+ start_index = 0
615
+ if group and brackets and brackets[0][0] == group:
616
+ start_index = 1
617
+
618
+ search = brackets[start_index:]
619
+ if not search or not any(is_category_text(text) for text, _start, _end in search[:2]):
620
+ return None
621
+
622
+ episode_index = None
623
+ for idx, (text, _start, _end) in enumerate(brackets):
624
+ if re.fullmatch(rf"(?:EP?|#)?0*{episode}(?:v\d+)?", text.strip(), re.I):
625
+ episode_index = idx
626
+ break
627
+ if episode_index is None:
628
+ return None
629
+
630
+ candidates: List[Tuple[int, str]] = []
631
+ for idx in range(start_index, episode_index):
632
+ text = brackets[idx][0].strip()
633
+ if not text or looks_like_episode_or_meta(text):
634
+ continue
635
+ score = 0
636
+ if SEASON_RE.search(text) or TRAILING_SEQUEL_MARKER_RE.search(text):
637
+ score += 50
638
+ if re.search(r"[\u3400-\u9fff]", text):
639
+ score += 20
640
+ if idx > start_index:
641
+ score += 10
642
+ candidates.append((score, text))
643
+
644
+ if not candidates:
645
+ return None
646
+ return max(candidates, key=lambda item: item[0])[1]
647
+
648
+
649
  def best_structural_episode(filename: str) -> Optional[int]:
650
  priorities = {
651
  "season_episode": 1000,
 
697
  rf"\s+[Ss]0*{season_text}$",
698
  rf"\s+Season\s*0*{season_text}$",
699
  rf"\s+0*{season_text}$",
700
+ rf"\s+第(?:0*{season_text}|{season_text})[季期部章]$",
701
  ]
702
  cleaned = title
703
  for pattern in patterns:
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:697d7491b83ef615994e02f11f0f65362c400f5eb6b4be8f43f02435ad43173f
3
  size 19142604
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:347b2f619fd63a71804c4742a069b20acd0cde870fc03cc2ac0f175b06586b72
3
  size 19142604
parse_eval_metrics.json CHANGED
@@ -1,22 +1,22 @@
1
  {
2
  "sample_count": 2048,
3
  "field_accuracy": {
4
- "group": 1.0,
5
- "title": 0.99658203125,
6
- "season": 0.994140625,
7
- "episode": 0.99609375,
8
- "resolution": 0.998046875,
9
- "source": 0.99365234375,
10
- "special": 0.998046875
11
  },
12
  "field_correct": {
13
- "group": 2048,
14
- "title": 2041,
15
- "season": 2036,
16
- "episode": 2040,
17
- "resolution": 2044,
18
- "source": 2035,
19
- "special": 2044
20
  },
21
  "field_total": {
22
  "group": 2048,
@@ -27,487 +27,487 @@
27
  "source": 2048,
28
  "special": 2048
29
  },
30
- "full_match_accuracy": 0.98046875,
31
- "full_match_correct": 2008,
32
  "full_match_total": 2048,
33
  "failures": [
34
  {
35
- "filename": "[DBD-Raws][Boruto Naruto Next Generations][menu][S13][D2][02][1080P][BDRip][HEVC-10bit][FLAC]",
36
  "errors": {
37
- "season": {
38
- "gold": null,
39
- "pred": "13"
40
  }
41
  },
42
  "gold": {
43
- "group": "DBD-Raws",
44
- "title": "Boruto Naruto Next Generations",
45
  "season": null,
46
- "episode": 2,
47
- "resolution": "1080P",
48
- "source": "BDRip",
49
  "special": null
50
  },
51
  "pred": {
52
- "group": "DBD-Raws",
53
- "title": "Boruto Naruto Next Generations",
54
- "season": 13,
55
- "episode": 2,
56
- "resolution": "1080P",
57
- "source": "BDRip",
58
  "special": null
59
  }
60
  },
61
  {
62
- "filename": "[アニメ BD] ギャラクシーエンジェル 第1期(無印) 第09話「ロストテクノロジーのローストビーフ」(1424x1072 HEVC 10bit FLAC softSub(chi+eng) chap)",
63
  "errors": {
64
- "season": {
65
- "gold": null,
66
- "pred": "1"
67
  }
68
  },
69
  "gold": {
70
- "group": "アニメ BD",
71
- "title": "ギャラクシーエンジェル 第1期(無印) 第09話「ロストテクノロジーのローストビーフ」",
72
  "season": null,
73
  "episode": 9,
74
- "resolution": "1424x1072",
75
- "source": "BD",
76
- "special": null
77
  },
78
  "pred": {
79
- "group": "アニメ BD",
80
- "title": "ギャラクシーエンジェル 第1期(無印) 第09話「ロストテクノロジーのローストビーフ」",
81
- "season": 1,
82
  "episode": 9,
83
- "resolution": "1424x1072",
84
- "source": "BD",
85
  "special": null
86
  }
87
  },
88
  {
89
- "filename": "コメットさん☆ 第11話 「バトンの力」(DVD DivX4.12 QB95 640x480 24f) [CRC32_C09E1AB0]",
90
  "errors": {
91
- "source": {
92
- "gold": "cr",
93
- "pred": "dvd"
 
 
 
 
94
  }
95
  },
96
  "gold": {
97
- "group": null,
98
- "title": "コメットさん☆",
99
  "season": null,
100
- "episode": 11,
101
- "resolution": "640x480",
102
- "source": "CR",
103
- "special": null
104
  },
105
  "pred": {
106
- "group": null,
107
- "title": "コメットさん☆",
108
  "season": null,
109
- "episode": 11,
110
- "resolution": "640x480",
111
- "source": "DVD",
112
- "special": null
113
  }
114
  },
115
  {
116
- "filename": "[Kamigami&Mabors&VCB-Studio] Saenai Heroine no Sodatekata Flat [07][Ma10p_1080p][x265_2aac]",
117
  "errors": {
118
  "source": {
119
- "gold": "aac",
120
- "pred": "x265-2aac"
121
  }
122
  },
123
  "gold": {
124
- "group": "Kamigami&Mabors&VCB-Studio",
125
- "title": "Saenai Heroine no Sodatekata Flat",
126
  "season": null,
127
- "episode": 7,
128
- "resolution": "1080p",
129
- "source": "aac",
130
  "special": null
131
  },
132
  "pred": {
133
- "group": "Kamigami&Mabors&VCB-Studio",
134
- "title": "Saenai Heroine no Sodatekata Flat",
135
  "season": null,
136
- "episode": 7,
137
- "resolution": "1080p",
138
- "source": "x265_2aac",
139
  "special": null
140
  }
141
  },
142
  {
143
- "filename": "[Liuyun&VCB-Studio] Hanasaku Iroha [07][Hi10p_1080p][x264_flac_ac3]",
144
  "errors": {
145
- "source": {
146
- "gold": "flac",
147
- "pred": "x264-flac"
148
  }
149
  },
150
  "gold": {
151
- "group": "Liuyun&VCB-Studio",
152
- "title": "Hanasaku Iroha",
153
  "season": null,
154
- "episode": 7,
155
  "resolution": "1080p",
156
- "source": "flac",
157
- "special": null
158
  },
159
  "pred": {
160
- "group": "Liuyun&VCB-Studio",
161
- "title": "Hanasaku Iroha",
162
- "season": null,
163
- "episode": 7,
164
  "resolution": "1080p",
165
- "source": "x264_flac",
166
- "special": null
167
  }
168
  },
169
  {
170
- "filename": "小新外传4[EP02][2017.06.07]出动!妖怪克星",
171
  "errors": {
172
- "title": {
173
- "gold": "小新外传4 ep02 2017 06",
174
- "pred": "小新外传 ep02 2"
175
- },
176
- "season": {
177
- "gold": null,
178
- "pred": "4"
179
- },
180
- "episode": {
181
- "gold": "7",
182
- "pred": "2"
183
  }
184
  },
185
  "gold": {
186
  "group": null,
187
- "title": "小新外传4 EP02 2017 06",
188
  "season": null,
189
- "episode": 7,
190
- "resolution": null,
191
- "source": null,
192
  "special": null
193
  },
194
  "pred": {
195
  "group": null,
196
- "title": "小新外传 EP02 2",
197
- "season": 4,
198
- "episode": 2,
199
- "resolution": null,
200
- "source": null,
201
  "special": null
202
  }
203
  },
204
  {
205
- "filename": "[GM-Team][国漫][异常生物见闻录][The Record of Unusual Creatures][2019][12][HEVC][GB][3840×2160]",
206
  "errors": {
207
- "resolution": {
208
- "gold": "3840×2160",
209
- "pred": "3840×"
210
  }
211
  },
212
  "gold": {
213
- "group": "GM-Team",
214
- "title": "国漫",
215
  "season": null,
216
- "episode": 12,
217
- "resolution": "3840×2160",
218
- "source": "GB",
219
- "special": null
220
  },
221
  "pred": {
222
- "group": "GM-Team",
223
- "title": "国漫",
224
  "season": null,
225
- "episode": 12,
226
- "resolution": "3840×",
227
- "source": "GB",
228
- "special": null
229
  }
230
  },
231
  {
232
- "filename": "Ⅱ 116 第108次鐘聲已經敲過了嗎?",
233
  "errors": {
234
- "title": {
235
- "gold": "ⅱ 116 第",
236
- "pred": ""
237
  }
238
  },
239
  "gold": {
240
- "group": null,
241
- "title": "Ⅱ 116 第",
242
  "season": null,
243
- "episode": 116,
244
- "resolution": null,
245
- "source": null,
246
- "special": null
247
  },
248
  "pred": {
249
- "group": null,
250
- "title": "",
251
  "season": null,
252
- "episode": 116,
253
- "resolution": null,
254
- "source": null,
255
- "special": null
256
  }
257
  },
258
  {
259
- "filename": "EP08 & EP11 NCED",
260
  "errors": {
261
- "title": {
262
- "gold": "&",
263
- "pred": "ep"
264
  }
265
  },
266
  "gold": {
267
- "group": null,
268
- "title": "&",
269
  "season": null,
270
- "episode": 11,
271
- "resolution": null,
272
- "source": null,
273
- "special": "NCED"
274
  },
275
  "pred": {
276
- "group": null,
277
- "title": "EP",
278
- "season": null,
279
- "episode": 11,
280
- "resolution": null,
281
- "source": null,
282
- "special": "NCED"
283
  }
284
  },
285
  {
286
- "filename": "[S1YURICON] Necronomico no Cosmic Horror Show[06][1080p][WebRip][HEVC_AAC][CHS]",
287
  "errors": {
288
  "season": {
289
- "gold": null,
290
  "pred": "1"
291
  }
292
  },
293
  "gold": {
294
- "group": "S1YURICON",
295
- "title": "Necronomico no Cosmic Horror Show",
296
- "season": null,
297
- "episode": 6,
298
- "resolution": "1080p",
299
- "source": "WebRip",
300
  "special": null
301
  },
302
  "pred": {
303
- "group": "S1YURICON",
304
- "title": "Necronomico no Cosmic Horror Show",
305
  "season": 1,
306
- "episode": 6,
307
- "resolution": "1080p",
308
- "source": "WebRip",
309
  "special": null
310
  }
311
  },
312
  {
313
- "filename": "[FZsub]Gate - Jieitai Kanochi nite, Kaku Tatakaeri 2 - 02(14) (MX 1280x720 x264 AAC)_x264",
314
  "errors": {
315
- "title": {
316
- "gold": "gate - jieitai kanochi nite, kaku tatakaeri 2",
317
- "pred": "gate - jieitai kanochi nite, kaku tatakaeri 2 - 02"
318
  },
319
- "season": {
320
- "gold": "2",
321
- "pred": null
322
  }
323
  },
324
  "gold": {
325
- "group": "FZsub",
326
- "title": "Gate - Jieitai Kanochi nite, Kaku Tatakaeri 2",
327
- "season": 2,
328
  "episode": 14,
329
- "resolution": "1280x720",
330
- "source": "x264",
331
  "special": null
332
  },
333
  "pred": {
334
- "group": "FZsub",
335
- "title": "Gate - Jieitai Kanochi nite, Kaku Tatakaeri 2 - 02",
336
  "season": null,
337
  "episode": 14,
338
- "resolution": "1280x720",
339
- "source": "x264",
340
  "special": null
341
  }
342
  },
343
  {
344
- "filename": "Mobile Suit Gundam SEED Destiny - HD Remaster 2013 Anime Music Clip2 - Life Goes On [BD 1248x702 23.976fps AVC-yuv420p10 FLAC] v2 - yan04000985",
345
  "errors": {
 
 
 
 
346
  "episode": {
347
- "gold": null,
348
- "pred": "23"
349
  }
350
  },
351
  "gold": {
352
- "group": null,
353
- "title": "Mobile Suit Gundam SEED Destiny - HD Remaster 2013 Anime Music Clip2 - Life Goes On",
354
  "season": null,
355
- "episode": null,
356
- "resolution": "1248x702",
357
  "source": "BD",
358
  "special": null
359
  },
360
  "pred": {
361
- "group": null,
362
- "title": "Mobile Suit Gundam SEED Destiny - HD Remaster 2013 Anime Music Clip2 - Life Goes On",
363
  "season": null,
364
- "episode": 23,
365
- "resolution": "1248x702",
366
  "source": "BD",
367
  "special": null
368
  }
369
  },
370
  {
371
- "filename": "Mary_E_Il_Giardino_Segreto_-_07_-_Camilla_[DvdMUX_by_Magic_©2008]",
372
  "errors": {
373
- "source": {
374
- "gold": null,
375
- "pred": "dvd"
376
  }
377
  },
378
  "gold": {
379
- "group": null,
380
- "title": "Mary_E_Il_Giardino_Segreto",
381
  "season": null,
382
- "episode": 7,
383
  "resolution": null,
384
- "source": null,
385
  "special": null
386
  },
387
  "pred": {
388
- "group": null,
389
- "title": "Mary_E_Il_Giardino_Segreto",
390
  "season": null,
391
- "episode": 7,
392
  "resolution": null,
393
- "source": "Dvd",
394
  "special": null
395
  }
396
  },
397
  {
398
- "filename": "(アニメ) アイドル伝説えり子24話 「心をつなぐ輪舞曲」 (DVD 640x480DivX5.02QB93 48kHz128kbps)",
399
  "errors": {
400
- "resolution": {
401
  "gold": null,
402
- "pred": "640x480"
403
  }
404
  },
405
  "gold": {
406
  "group": "アニメ",
407
- "title": "アイドル伝説えり子",
408
  "season": null,
409
- "episode": 24,
410
- "resolution": null,
411
- "source": "DVD",
412
  "special": null
413
  },
414
  "pred": {
415
  "group": "アニメ",
416
- "title": "アイドル伝説えり子",
417
- "season": null,
418
- "episode": 24,
419
  "resolution": "640x480",
420
- "source": "DVD",
421
  "special": null
422
  }
423
  },
424
  {
425
- "filename": "[DMG] 東京レイヴンズ 第06話「days in nest -休日-」 [BDRip][AVC_AAC][720P][CHS](A8161323)",
426
  "errors": {
427
- "episode": {
428
- "gold": "1323",
429
- "pred": "6"
430
  }
431
  },
432
  "gold": {
433
- "group": "DMG",
434
- "title": "東京レイヴンズ 第06話「days in nest -休日-」",
435
  "season": null,
436
- "episode": 1323,
437
- "resolution": "720P",
438
- "source": "BDRip",
439
  "special": null
440
  },
441
  "pred": {
442
- "group": "DMG",
443
- "title": "東京レイヴン��� 第06話「days in nest -休日-」",
444
  "season": null,
445
  "episode": 6,
446
- "resolution": "720P",
447
- "source": "BDRip",
448
  "special": null
449
  }
450
  },
451
  {
452
- "filename": "[S1YURICON] Necronomico no Cosmic Horror Show[05v2][1080p][WebRip][AVC_AAC][CHS]",
453
  "errors": {
 
 
 
 
454
  "season": {
455
- "gold": null,
456
  "pred": "1"
457
  }
458
  },
459
  "gold": {
460
- "group": "S1YURICON",
461
- "title": "Necronomico no Cosmic Horror Show",
462
- "season": null,
463
- "episode": 5,
464
  "resolution": "1080p",
465
- "source": "WebRip",
466
  "special": null
467
  },
468
  "pred": {
469
- "group": "S1YURICON",
470
- "title": "Necronomico no Cosmic Horror Show",
471
  "season": 1,
472
- "episode": 5,
473
  "resolution": "1080p",
474
- "source": "WebRip",
475
  "special": null
476
  }
477
  },
478
  {
479
- "filename": "Cardcaptor Sakura - 17 [x264-AAC-BD1440x1080p][Sakura][C-W][E2B50799]",
480
  "errors": {
481
- "resolution": {
482
- "gold": null,
483
- "pred": "1080p"
484
- },
485
- "source": {
486
  "gold": null,
487
- "pred": "e2b50799"
488
  }
489
  },
490
  "gold": {
491
- "group": null,
492
- "title": "Cardcaptor Sakura",
493
  "season": null,
494
- "episode": 17,
495
  "resolution": null,
496
- "source": null,
497
  "special": null
498
  },
499
  "pred": {
500
- "group": null,
501
- "title": "Cardcaptor Sakura",
502
- "season": null,
503
- "episode": 17,
504
- "resolution": "1080p",
505
- "source": "E2B50799",
506
  "special": null
507
  }
508
  },
509
  {
510
- "filename": "[Xspitfire911] Tate no Yuusha no Nariagari S01E20 BDRIP 1080p X265 10bit VOSTFR",
511
  "errors": {
512
  "season": {
513
  "gold": null,
@@ -515,80 +515,49 @@
515
  }
516
  },
517
  "gold": {
518
- "group": "Xspitfire911",
519
- "title": "Tate no Yuusha no Nariagari",
520
  "season": null,
521
- "episode": 20,
522
- "resolution": "1080p",
523
- "source": "BDRIP",
524
  "special": null
525
  },
526
  "pred": {
527
- "group": "Xspitfire911",
528
- "title": "Tate no Yuusha no Nariagari",
529
  "season": 1,
530
- "episode": 20,
531
- "resolution": "1080p",
532
- "source": "BDRIP",
533
  "special": null
534
  }
535
  },
536
  {
537
- "filename": "[KTXP][Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka IV][13][BIG5][720P][MP4]",
538
  "errors": {
539
- "title": {
540
- "gold": "dungeon ni deai wo motomeru no wa machigatteiru darou ka",
541
- "pred": "dungeon ni deai wo motomeru no wa machigatteiru darou ka iv"
542
- },
543
  "season": {
544
- "gold": "4",
545
- "pred": null
546
- }
547
- },
548
- "gold": {
549
- "group": "KTXP",
550
- "title": "Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka",
551
- "season": 4,
552
- "episode": 13,
553
- "resolution": "720P",
554
- "source": "BIG5",
555
- "special": null
556
- },
557
- "pred": {
558
- "group": "KTXP",
559
- "title": "Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka IV",
560
- "season": null,
561
- "episode": 13,
562
- "resolution": "720P",
563
- "source": "BIG5",
564
- "special": null
565
- }
566
- },
567
- {
568
- "filename": "[JyFanSub][Fate_Apocrypha][15][GB][1080]p",
569
- "errors": {
570
- "episode": {
571
- "gold": "1080",
572
- "pred": "15"
573
  }
574
  },
575
  "gold": {
576
- "group": "JyFanSub",
577
- "title": "Fate_Apocrypha",
578
  "season": null,
579
- "episode": 1080,
580
- "resolution": null,
581
- "source": "GB",
582
- "special": null
583
  },
584
  "pred": {
585
- "group": "JyFanSub",
586
- "title": "Fate_Apocrypha",
587
- "season": null,
588
- "episode": 15,
589
- "resolution": null,
590
- "source": "GB",
591
- "special": null
592
  }
593
  }
594
  ]
 
1
  {
2
  "sample_count": 2048,
3
  "field_accuracy": {
4
+ "group": 0.99951171875,
5
+ "title": 0.99755859375,
6
+ "season": 0.99609375,
7
+ "episode": 0.998046875,
8
+ "resolution": 1.0,
9
+ "source": 0.99853515625,
10
+ "special": 0.9990234375
11
  },
12
  "field_correct": {
13
+ "group": 2047,
14
+ "title": 2043,
15
+ "season": 2040,
16
+ "episode": 2044,
17
+ "resolution": 2048,
18
+ "source": 2045,
19
+ "special": 2046
20
  },
21
  "field_total": {
22
  "group": 2048,
 
27
  "source": 2048,
28
  "special": 2048
29
  },
30
+ "full_match_accuracy": 0.99072265625,
31
+ "full_match_correct": 2029,
32
  "full_match_total": 2048,
33
  "failures": [
34
  {
35
+ "filename": "[ig]Itai no wa Iya nano de Bougyoryoku ni Kyokufuri Shitai to Omoimasu[WebRip 1920x1080 AVC YUV420 8Bit 1080p AAC].03.TC",
36
  "errors": {
37
+ "episode": {
38
+ "gold": "3",
39
+ "pred": null
40
  }
41
  },
42
  "gold": {
43
+ "group": "ig",
44
+ "title": "Itai no wa Iya nano de Bougyoryoku ni Kyokufuri Shitai to Omoimasu",
45
  "season": null,
46
+ "episode": 3,
47
+ "resolution": "1080p",
48
+ "source": "WebRip",
49
  "special": null
50
  },
51
  "pred": {
52
+ "group": "ig",
53
+ "title": "Itai no wa Iya nano de Bougyoryoku ni Kyokufuri Shitai to Omoimasu",
54
+ "season": null,
55
+ "episode": null,
56
+ "resolution": "1080p",
57
+ "source": "WebRip",
58
  "special": null
59
  }
60
  },
61
  {
62
+ "filename": "[YYDM-11FANS][Nanana's Buried Treasure][preview][09][BDrip][720P][X264-10bit_AAC][34D29ED6]",
63
  "errors": {
64
+ "special": {
65
+ "gold": "ed",
66
+ "pred": null
67
  }
68
  },
69
  "gold": {
70
+ "group": "YYDM-11FANS",
71
+ "title": "Nanana's Buried Treasure",
72
  "season": null,
73
  "episode": 9,
74
+ "resolution": "720P",
75
+ "source": "BDrip",
76
+ "special": "ED"
77
  },
78
  "pred": {
79
+ "group": "YYDM-11FANS",
80
+ "title": "Nanana's Buried Treasure",
81
+ "season": null,
82
  "episode": 9,
83
+ "resolution": "720P",
84
+ "source": "BDrip",
85
  "special": null
86
  }
87
  },
88
  {
89
+ "filename": "[Moozzi2] Madou King Granzort Saigo no Magical Taisen OVA - 01 [ 1990 ] (BD 1440x1080 x.264 Flac)",
90
  "errors": {
91
+ "title": {
92
+ "gold": "madou king granzort saigo no magical taisen ova",
93
+ "pred": "madou king granzort saigo no magical taisen ova - 01 [ 1990"
94
+ },
95
+ "episode": {
96
+ "gold": "1",
97
+ "pred": "1990"
98
  }
99
  },
100
  "gold": {
101
+ "group": "Moozzi2",
102
+ "title": "Madou King Granzort Saigo no Magical Taisen OVA",
103
  "season": null,
104
+ "episode": 1,
105
+ "resolution": "1440x1080",
106
+ "source": "BD",
107
+ "special": "OVA"
108
  },
109
  "pred": {
110
+ "group": "Moozzi2",
111
+ "title": "Madou King Granzort Saigo no Magical Taisen OVA - 01 [ 1990 ",
112
  "season": null,
113
+ "episode": 1990,
114
+ "resolution": "1440x1080",
115
+ "source": "BD",
116
+ "special": "OVA"
117
  }
118
  },
119
  {
120
+ "filename": "[64bitsub][Tensui no Sakuna-hime][08][BDRIP_1920x1080][AVC_FLAC_SUP]",
121
  "errors": {
122
  "source": {
123
+ "gold": "flac",
124
+ "pred": "avc-flac"
125
  }
126
  },
127
  "gold": {
128
+ "group": "64bitsub",
129
+ "title": "Tensui no Sakuna-hime",
130
  "season": null,
131
+ "episode": 8,
132
+ "resolution": "1920x1080",
133
+ "source": "FLAC",
134
  "special": null
135
  },
136
  "pred": {
137
+ "group": "64bitsub",
138
+ "title": "Tensui no Sakuna-hime",
139
  "season": null,
140
+ "episode": 8,
141
+ "resolution": "1920x1080",
142
+ "source": "AVC_FLAC",
143
  "special": null
144
  }
145
  },
146
  {
147
+ "filename": "[VCB-Studio] Shingeki no Kyojin Movie 3 Kakusei no Houkou [Teaser_S3][Ma10p_1080p][x265_flac]",
148
  "errors": {
149
+ "season": {
150
+ "gold": null,
151
+ "pred": "3"
152
  }
153
  },
154
  "gold": {
155
+ "group": "VCB-Studio",
156
+ "title": "Shingeki no Kyojin Movie 3 Kakusei no Houkou [Teaser_S3",
157
  "season": null,
158
+ "episode": 3,
159
  "resolution": "1080p",
160
+ "source": "x265_flac",
161
+ "special": "Movie"
162
  },
163
  "pred": {
164
+ "group": "VCB-Studio",
165
+ "title": "Shingeki no Kyojin Movie 3 Kakusei no Houkou [Teaser_S3",
166
+ "season": 3,
167
+ "episode": 3,
168
  "resolution": "1080p",
169
+ "source": "x265_flac",
170
+ "special": "Movie"
171
  }
172
  },
173
  {
174
+ "filename": "FF:U ファイナルファンタジー:アンリミテッド ~異界の章~ #15 「ジェーン~うごきだすうみパズル」(DVD 640x480 DivX5 QB98 120fps lameVBR)[CRC_5FA44899]",
175
  "errors": {
176
+ "source": {
177
+ "gold": "cr",
178
+ "pred": "dvd"
 
 
 
 
 
 
 
 
179
  }
180
  },
181
  "gold": {
182
  "group": null,
183
+ "title": "FF:U ファイナルファンタジー:アンリミテッド ~異界の章~",
184
  "season": null,
185
+ "episode": 15,
186
+ "resolution": "640x480",
187
+ "source": "CR",
188
  "special": null
189
  },
190
  "pred": {
191
  "group": null,
192
+ "title": "FF:U ファイナルファンタジー:アンリミテッド ~異界の章~",
193
+ "season": null,
194
+ "episode": 15,
195
+ "resolution": "640x480",
196
+ "source": "DVD",
197
  "special": null
198
  }
199
  },
200
  {
201
+ "filename": "[OVA]GALLFORCE ガルフォース2 宇宙章 vol2 [DESTRUCTION]",
202
  "errors": {
203
+ "title": {
204
+ "gold": "gallforce ガルフォース2 宇宙章 vol",
205
+ "pred": "gallforce ガルフォース2 宇宙"
206
  }
207
  },
208
  "gold": {
209
+ "group": "OVA",
210
+ "title": "GALLFORCE ガルフォース2 宇宙章 vol",
211
  "season": null,
212
+ "episode": 2,
213
+ "resolution": null,
214
+ "source": null,
215
+ "special": "OVA"
216
  },
217
  "pred": {
218
+ "group": "OVA",
219
+ "title": "GALLFORCE ガルフォース2 宇宙",
220
  "season": null,
221
+ "episode": 2,
222
+ "resolution": null,
223
+ "source": null,
224
+ "special": "OVA"
225
  }
226
  },
227
  {
228
+ "filename": "[病毒].[Fosky_Fansub][Virus_Buster_Serge][DVDrip][12][H264_AAC][640x480][GB&BIG5][F77551D0](ED2000.COM)",
229
  "errors": {
230
+ "special": {
231
+ "gold": "ed",
232
+ "pred": "e"
233
  }
234
  },
235
  "gold": {
236
+ "group": "病毒",
237
+ "title": "Fosky_Fansub",
238
  "season": null,
239
+ "episode": 12,
240
+ "resolution": "640x480",
241
+ "source": "DVDrip",
242
+ "special": "ED"
243
  },
244
  "pred": {
245
+ "group": "病毒",
246
+ "title": "Fosky_Fansub",
247
  "season": null,
248
+ "episode": 12,
249
+ "resolution": "640x480",
250
+ "source": "DVDrip",
251
+ "special": "E"
252
  }
253
  },
254
  {
255
+ "filename": "[DBD-Raws][Shadows House S1][Gekijou][18][1080P][BDRip][HEVC-10bit][FLAC]",
256
  "errors": {
257
+ "season": {
258
+ "gold": null,
259
+ "pred": "1"
260
  }
261
  },
262
  "gold": {
263
+ "group": "DBD-Raws",
264
+ "title": "Shadows House",
265
  "season": null,
266
+ "episode": 18,
267
+ "resolution": "1080P",
268
+ "source": "BDRip",
269
+ "special": null
270
  },
271
  "pred": {
272
+ "group": "DBD-Raws",
273
+ "title": "Shadows House",
274
+ "season": 1,
275
+ "episode": 18,
276
+ "resolution": "1080P",
277
+ "source": "BDRip",
278
+ "special": null
279
  }
280
  },
281
  {
282
+ "filename": "Girls und Panzer - 10.5 (BD 1280x720 AVC AACx2)",
283
  "errors": {
284
  "season": {
285
+ "gold": "10",
286
  "pred": "1"
287
  }
288
  },
289
  "gold": {
290
+ "group": null,
291
+ "title": "Girls und Panzer - 10.5",
292
+ "season": 10,
293
+ "episode": 5,
294
+ "resolution": "1280x720",
295
+ "source": "BD",
296
  "special": null
297
  },
298
  "pred": {
299
+ "group": null,
300
+ "title": "Girls und Panzer - 10.5",
301
  "season": 1,
302
+ "episode": 5,
303
+ "resolution": "1280x720",
304
+ "source": "BD",
305
  "special": null
306
  }
307
  },
308
  {
309
+ "filename": "[POPGO&SumiSora&TxxZ] Ginga Eiyuu Densetsu Die Neue These - Seiran 14 (BDRip 1080P X265 Main10p TrueHDX2 Chap)[A4E18C32]",
310
  "errors": {
311
+ "group": {
312
+ "gold": null,
313
+ "pred": "popgo&sumisora&txxz"
314
  },
315
+ "title": {
316
+ "gold": "popgo&sumisora&txxz",
317
+ "pred": "ginga eiyuu densetsu die neue these - seiran 14"
318
  }
319
  },
320
  "gold": {
321
+ "group": null,
322
+ "title": "POPGO&SumiSora&TxxZ",
323
+ "season": null,
324
  "episode": 14,
325
+ "resolution": "1080P",
326
+ "source": "BDRip",
327
  "special": null
328
  },
329
  "pred": {
330
+ "group": "POPGO&SumiSora&TxxZ",
331
+ "title": "Ginga Eiyuu Densetsu Die Neue These - Seiran 14",
332
  "season": null,
333
  "episode": 14,
334
+ "resolution": "1080P",
335
+ "source": "BDRip",
336
  "special": null
337
  }
338
  },
339
  {
340
+ "filename": "[アニメ BD] Serial Experiments Lain 映像特典 「trailer 01」 (1440x1080 x264 AAC 2ch)",
341
  "errors": {
342
+ "title": {
343
+ "gold": "serial experiments lain 映像特典 「trailer 01」",
344
+ "pred": "serial experiments lain 映像特典 「trailer"
345
+ },
346
  "episode": {
347
+ "gold": "2",
348
+ "pred": "1"
349
  }
350
  },
351
  "gold": {
352
+ "group": "アニメ BD",
353
+ "title": "Serial Experiments Lain 映像特典 「trailer 01」",
354
  "season": null,
355
+ "episode": 2,
356
+ "resolution": "1440x1080",
357
  "source": "BD",
358
  "special": null
359
  },
360
  "pred": {
361
+ "group": "アニメ BD",
362
+ "title": "Serial Experiments Lain 映像特典 「trailer",
363
  "season": null,
364
+ "episode": 1,
365
+ "resolution": "1440x1080",
366
  "source": "BD",
367
  "special": null
368
  }
369
  },
370
  {
371
+ "filename": "[AJZ&BLU][God Eater][05][BIG5][v2] (2)",
372
  "errors": {
373
+ "episode": {
374
+ "gold": "2",
375
+ "pred": "5"
376
  }
377
  },
378
  "gold": {
379
+ "group": "AJZ&BLU",
380
+ "title": "God Eater",
381
  "season": null,
382
+ "episode": 2,
383
  "resolution": null,
384
+ "source": "BIG5",
385
  "special": null
386
  },
387
  "pred": {
388
+ "group": "AJZ&BLU",
389
+ "title": "God Eater",
390
  "season": null,
391
+ "episode": 5,
392
  "resolution": null,
393
+ "source": "BIG5",
394
  "special": null
395
  }
396
  },
397
  {
398
+ "filename": "(アニメ) YAT安心!宇宙旅行1期 第07話 「サバイバル!野生のカネア」 (LD 640x480 WMV9 QB90 24fps)",
399
  "errors": {
400
+ "season": {
401
  "gold": null,
402
+ "pred": "1"
403
  }
404
  },
405
  "gold": {
406
  "group": "アニメ",
407
+ "title": "YAT安心!宇宙旅行",
408
  "season": null,
409
+ "episode": 7,
410
+ "resolution": "640x480",
411
+ "source": null,
412
  "special": null
413
  },
414
  "pred": {
415
  "group": "アニメ",
416
+ "title": "YAT安心!宇宙旅行",
417
+ "season": 1,
418
+ "episode": 7,
419
  "resolution": "640x480",
420
+ "source": null,
421
  "special": null
422
  }
423
  },
424
  {
425
+ "filename": "Lord El-Melloi II-sei no Jikenbo 06 [1AAC021C]",
426
  "errors": {
427
+ "source": {
428
+ "gold": "aac",
429
+ "pred": null
430
  }
431
  },
432
  "gold": {
433
+ "group": null,
434
+ "title": "Lord El-Melloi II-sei no Jikenbo",
435
  "season": null,
436
+ "episode": 6,
437
+ "resolution": null,
438
+ "source": "AAC",
439
  "special": null
440
  },
441
  "pred": {
442
+ "group": null,
443
+ "title": "Lord El-Melloi II-sei no Jikenbo",
444
  "season": null,
445
  "episode": 6,
446
+ "resolution": null,
447
+ "source": null,
448
  "special": null
449
  }
450
  },
451
  {
452
+ "filename": "[Skymoon-Raws] Mashle 2nd Season - 01(13) [ViuTV][WEB-DL][1080p][AVC AAC]",
453
  "errors": {
454
+ "title": {
455
+ "gold": "mashle 2nd season - 01",
456
+ "pred": "mashle 2nd season"
457
+ },
458
  "season": {
459
+ "gold": "2",
460
  "pred": "1"
461
  }
462
  },
463
  "gold": {
464
+ "group": "Skymoon-Raws",
465
+ "title": "Mashle 2nd Season - 01",
466
+ "season": 2,
467
+ "episode": 13,
468
  "resolution": "1080p",
469
+ "source": "WEB-DL",
470
  "special": null
471
  },
472
  "pred": {
473
+ "group": "Skymoon-Raws",
474
+ "title": "Mashle 2nd Season",
475
  "season": 1,
476
+ "episode": 13,
477
  "resolution": "1080p",
478
+ "source": "WEB-DL",
479
  "special": null
480
  }
481
  },
482
  {
483
+ "filename": "【CXRAW】【S17】【Power Rangers RPM】【30】【End Game】【x264 Hi10p AAC】【MP4】",
484
  "errors": {
485
+ "season": {
 
 
 
 
486
  "gold": null,
487
+ "pred": "17"
488
  }
489
  },
490
  "gold": {
491
+ "group": "CXRAW",
492
+ "title": "S17",
493
  "season": null,
494
+ "episode": 30,
495
  "resolution": null,
496
+ "source": "AAC",
497
  "special": null
498
  },
499
  "pred": {
500
+ "group": "CXRAW",
501
+ "title": "S17",
502
+ "season": 17,
503
+ "episode": 30,
504
+ "resolution": null,
505
+ "source": "AAC",
506
  "special": null
507
  }
508
  },
509
  {
510
+ "filename": "(アニメ) YAT安心!宇宙旅行 第1期 第24話 「モーレツ!かあちゃん珍道中」 (LD 640x480 WMV9 QB90 24fps)",
511
  "errors": {
512
  "season": {
513
  "gold": null,
 
515
  }
516
  },
517
  "gold": {
518
+ "group": "アニメ",
519
+ "title": "YAT安心!宇宙旅行",
520
  "season": null,
521
+ "episode": 24,
522
+ "resolution": "640x480",
523
+ "source": null,
524
  "special": null
525
  },
526
  "pred": {
527
+ "group": "アニメ",
528
+ "title": "YAT安心!宇宙旅行",
529
  "season": 1,
530
+ "episode": 24,
531
+ "resolution": "640x480",
532
+ "source": null,
533
  "special": null
534
  }
535
  },
536
  {
537
+ "filename": "[Snow-Raws] アイドルマスター シンデレラガールズ劇場 第2期 SP17 (DVD 1280x720 HEVC-YUV420P10 FLAC)",
538
  "errors": {
 
 
 
 
539
  "season": {
540
+ "gold": null,
541
+ "pred": "2"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
542
  }
543
  },
544
  "gold": {
545
+ "group": "Snow-Raws",
546
+ "title": "アイドルマスター シンデレラガールズ劇場 第2期 SP17",
547
  "season": null,
548
+ "episode": 17,
549
+ "resolution": "1280x720",
550
+ "source": "DVD",
551
+ "special": "SP"
552
  },
553
  "pred": {
554
+ "group": "Snow-Raws",
555
+ "title": "アイドルマスター シンデレラガールズ劇場 第2期 SP17",
556
+ "season": 2,
557
+ "episode": 17,
558
+ "resolution": "1280x720",
559
+ "source": "DVD",
560
+ "special": "SP"
561
  }
562
  }
563
  ]
run_metadata.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "experiment_name": "dmhy-char-full-relabel",
3
  "data_file": "datasets/AnimeName/dmhy_weak_char.jsonl",
4
  "tokenizer_variant": "char",
5
  "vocab_file": "datasets/AnimeName/vocab.char.json",
@@ -15,7 +15,7 @@
15
  "batch_size": 256,
16
  "learning_rate": 8e-05,
17
  "warmup_steps": 300,
18
- "seed": 48,
19
  "device": "cuda",
20
  "fp16": true,
21
  "gradient_accumulation_steps": 1,
 
1
  {
2
+ "experiment_name": "dmhy-char-guoman-relabel",
3
  "data_file": "datasets/AnimeName/dmhy_weak_char.jsonl",
4
  "tokenizer_variant": "char",
5
  "vocab_file": "datasets/AnimeName/vocab.char.json",
 
15
  "batch_size": 256,
16
  "learning_rate": 8e-05,
17
  "warmup_steps": 300,
18
+ "seed": 52,
19
  "device": "cuda",
20
  "fp16": true,
21
  "gradient_accumulation_steps": 1,
trainer_eval_metrics.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
- "eval_loss": 0.01631847210228443,
3
- "eval_precision": 0.9799749533444652,
4
- "eval_recall": 0.986698478236683,
5
- "eval_f1": 0.9833252228334185,
6
- "eval_accuracy": 0.9943065860243627,
7
- "eval_runtime": 39.3604,
8
- "eval_samples_per_second": 321.161,
9
- "eval_steps_per_second": 1.27,
10
  "epoch": 2.0
11
  }
 
1
  {
2
+ "eval_loss": 0.005763721186667681,
3
+ "eval_precision": 0.9921522239605195,
4
+ "eval_recall": 0.9946191314105016,
5
+ "eval_f1": 0.9933841461473317,
6
+ "eval_accuracy": 0.9980711558885925,
7
+ "eval_runtime": 45.558,
8
+ "eval_samples_per_second": 277.471,
9
+ "eval_steps_per_second": 1.098,
10
  "epoch": 2.0
11
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b5aa0df615ce731796aa9934b0505e00a685611be134c071d7b2487d8112dde1
3
  size 5265
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f01503ec029ec161063c2d78a00732c80072525b8d258c7c717b2e21f4f55d93
3
  size 5265