ModerRAS
/

AniFileBERT

@@ -65,10 +65,12 @@ Optional controls:
 --keep-encoding-noise
 ```
-The output is intended to match `tools/apply_dmhy_template_recipes.py` at the
-record schema level: `filename`, `tokens`, `labels`, `template_id`, `template`,
-plus optional `source_filename`, `path_trimmed`, and
-`dropped_title_candidate_positions`.
 For low-frequency templates (`count <= --audit-max-count`, default `50`), apply
 uses a conservative gate: records with `no_title`, `multiple_title_spans`,

 --keep-encoding-noise
 ```
+The output record schema is `filename`, `tokens`, `labels`, `template_id`, and
+`template`, plus optional `source_filename`, `path_trimmed`, and
+`dropped_title_candidate_positions`. Clustered recipe rows also include
+`title_spans` and `title_boundary_decisions` metadata so downstream synthetic
+augmentation can distinguish one logical title span from repeated/path title
+slots.
 For low-frequency templates (`count <= --audit-max-count`, default `50`), apply
 uses a conservative gate: records with `no_title`, `multiple_title_spans`,