ModerRAS commited on
Commit
5c84315
·
1 Parent(s): 8c1da73

Document Rust DMHY template recipe schema

Browse files
tools/rust_dmhy_template_apply/README.md CHANGED
@@ -65,10 +65,12 @@ Optional controls:
65
  --keep-encoding-noise
66
  ```
67
 
68
- The output is intended to match `tools/apply_dmhy_template_recipes.py` at the
69
- record schema level: `filename`, `tokens`, `labels`, `template_id`, `template`,
70
- plus optional `source_filename`, `path_trimmed`, and
71
- `dropped_title_candidate_positions`.
 
 
72
 
73
  For low-frequency templates (`count <= --audit-max-count`, default `50`), apply
74
  uses a conservative gate: records with `no_title`, `multiple_title_spans`,
 
65
  --keep-encoding-noise
66
  ```
67
 
68
+ The output record schema is `filename`, `tokens`, `labels`, `template_id`, and
69
+ `template`, plus optional `source_filename`, `path_trimmed`, and
70
+ `dropped_title_candidate_positions`. Clustered recipe rows also include
71
+ `title_spans` and `title_boundary_decisions` metadata so downstream synthetic
72
+ augmentation can distinguish one logical title span from repeated/path title
73
+ slots.
74
 
75
  For low-frequency templates (`count <= --audit-max-count`, default `50`), apply
76
  uses a conservative gate: records with `no_title`, `multiple_title_spans`,