Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
Document synthetic hard-focus training recipe
Browse files
AGENTS.md
CHANGED
|
@@ -116,6 +116,30 @@ cargo run --release --manifest-path tools\schema_v2_synthetic_augment\Cargo.toml
|
|
| 116 |
--manifest data\schema_v2_synthetic_aug.manifest.json
|
| 117 |
```
|
| 118 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
Export for Android:
|
| 120 |
|
| 121 |
```bash
|
|
|
|
| 116 |
--manifest data\schema_v2_synthetic_aug.manifest.json
|
| 117 |
```
|
| 118 |
|
| 119 |
+
Preferred synthetic follow-up training is a second stage from the best repaired
|
| 120 |
+
hard-focus checkpoint, not a replacement for hard-focus. Do not combine
|
| 121 |
+
`--encoded-cache-dir` with `--extra-data-file`; use the raw hard-focus JSONL
|
| 122 |
+
when mixing synthetic augmentation, or rebuild a combined Rust encoded cache.
|
| 123 |
+
Use native Windows Python from `.venv` after confirming CUDA works:
|
| 124 |
+
|
| 125 |
+
```powershell
|
| 126 |
+
.\.venv\Scripts\python.exe -m anifilebert.train --tokenizer char `
|
| 127 |
+
--data-file data\schema_v2_hard_focus_char_seed63.jsonl `
|
| 128 |
+
--extra-data-file data\schema_v2_synthetic_aug.jsonl `
|
| 129 |
+
--extra-data-repeat 3 `
|
| 130 |
+
--vocab-file datasets\AnimeName\vocab.char.json `
|
| 131 |
+
--save-dir checkpoints\schema-v2-best-hardfocus-synth-pathleaf `
|
| 132 |
+
--init-model-dir checkpoints\ablation-schema-v2-hardfocus-cache-repaired-from-baseline-seed62-10epoch-rerun\final `
|
| 133 |
+
--epochs 2 --batch-size 512 --learning-rate 0.00004 --warmup-steps 120 `
|
| 134 |
+
--max-seq-length 128 --train-split 0.995 --num-workers 0 `
|
| 135 |
+
--checkpoint-steps 1000 --save-total-limit 3 --no-periodic-eval `
|
| 136 |
+
--bf16 --auto-find-batch-size `
|
| 137 |
+
--parse-eval-limit 2048 `
|
| 138 |
+
--case-eval-file data\parser_regression_cases.json `
|
| 139 |
+
--case-eval-output reports\schema_v2_best_hardfocus_synth_pathleaf_case_metrics.json `
|
| 140 |
+
--seed 63 --experiment-name schema-v2-best-hardfocus-synth-pathleaf
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
Export for Android:
|
| 144 |
|
| 145 |
```bash
|