ModerRAS
/

AniFileBERT

@@ -116,6 +116,30 @@ cargo run --release --manifest-path tools\schema_v2_synthetic_augment\Cargo.toml
   --manifest data\schema_v2_synthetic_aug.manifest.json
 ```
 Export for Android:
 ```bash

   --manifest data\schema_v2_synthetic_aug.manifest.json
 ```
+Preferred synthetic follow-up training is a second stage from the best repaired
+hard-focus checkpoint, not a replacement for hard-focus. Do not combine
+`--encoded-cache-dir` with `--extra-data-file`; use the raw hard-focus JSONL
+when mixing synthetic augmentation, or rebuild a combined Rust encoded cache.
+Use native Windows Python from `.venv` after confirming CUDA works:
+```powershell
+.\.venv\Scripts\python.exe -m anifilebert.train --tokenizer char `
+  --data-file data\schema_v2_hard_focus_char_seed63.jsonl `
+  --extra-data-file data\schema_v2_synthetic_aug.jsonl `
+  --extra-data-repeat 3 `
+  --vocab-file datasets\AnimeName\vocab.char.json `
+  --save-dir checkpoints\schema-v2-best-hardfocus-synth-pathleaf `
+  --init-model-dir checkpoints\ablation-schema-v2-hardfocus-cache-repaired-from-baseline-seed62-10epoch-rerun\final `
+  --epochs 2 --batch-size 512 --learning-rate 0.00004 --warmup-steps 120 `
+  --max-seq-length 128 --train-split 0.995 --num-workers 0 `
+  --checkpoint-steps 1000 --save-total-limit 3 --no-periodic-eval `
+  --bf16 --auto-find-batch-size `
+  --parse-eval-limit 2048 `
+  --case-eval-file data\parser_regression_cases.json `
+  --case-eval-output reports\schema_v2_best_hardfocus_synth_pathleaf_case_metrics.json `
+  --seed 63 --experiment-name schema-v2-best-hardfocus-synth-pathleaf
+```
 Export for Android:
 ```bash