Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
| # AniFileBERT Maintenance | |
| This repository is the standalone Hugging Face model repo used by MiruPlay as | |
| `tools/anime_parser`. | |
| ## Related Repositories | |
| | Repository | URL | Purpose | | |
| |------------|-----|---------| | |
| | AniFileBERT | `https://huggingface.co/ModerRAS/AniFileBERT` | Model, training scripts, ONNX export | | |
| | AnimeName | `https://huggingface.co/datasets/ModerRAS/AnimeName` | Training datasets and manifests | | |
| | MiruPlay | `https://github.com/ModerRAS/MiruPlay` | Android app and runtime integration | | |
| Nested structure: | |
| ```text | |
| AniFileBERT | |
| datasets/AnimeName -> ModerRAS/AnimeName | |
| ``` | |
| ## Clone | |
| ```bash | |
| git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT | |
| ``` | |
| After a normal clone: | |
| ```bash | |
| git submodule update --init --recursive | |
| ``` | |
| ## Dataset Waterline | |
| Current DMHY snapshot: | |
| ```text | |
| labeled_samples: 632002 | |
| char_vocab_size: 6199 | |
| strict_bio_violations: 0 | |
| ``` | |
| The authoritative dataset files live in `datasets/AnimeName`. | |
| ## Train | |
| ```bash | |
| uv sync | |
| uv run python train.py \ | |
| --tokenizer char \ | |
| --data-file datasets/AnimeName/dmhy_weak_char.jsonl \ | |
| --vocab-file datasets/AnimeName/vocab.char.json \ | |
| --save-dir checkpoints/dmhy-char-full-relabel \ | |
| --init-model-dir . \ | |
| --epochs 2 \ | |
| --batch-size 256 \ | |
| --learning-rate 0.00008 \ | |
| --warmup-steps 300 \ | |
| --max-seq-length 128 \ | |
| --checkpoint-steps 1000 \ | |
| --parse-eval-limit 2048 \ | |
| --seed 48 | |
| ``` | |
| ## Publish a New Checkpoint | |
| Copy the final checkpoint to the repository root: | |
| ```powershell | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/config.json . -Force | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/model.safetensors . -Force | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/tokenizer_config.json . -Force | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/training_args.bin . -Force | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/vocab.json . -Force | |
| Copy-Item datasets/AnimeName/vocab.char.json .\vocab.char.json -Force | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/run_metadata.json . -Force | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/trainer_eval_metrics.json . -Force | |
| Copy-Item checkpoints/dmhy-char-full-relabel/final/parse_eval_metrics.json . -Force | |
| ``` | |
| There is no tracked `model/` duplicate. The root checkpoint is the publishing | |
| surface; ignored `checkpoints/` directories are training artifacts. | |
| Then commit and push: | |
| ```bash | |
| git add . | |
| git commit -m "Update AniFileBERT checkpoint" | |
| git push origin main | |
| ``` | |
| ## Update the Dataset Submodule | |
| After pushing new files to `ModerRAS/AnimeName`, update the nested pointer: | |
| ```bash | |
| git submodule update --remote datasets/AnimeName | |
| git add datasets/AnimeName | |
| git commit -m "Update AnimeName dataset pointer" | |
| git push origin main | |
| ``` | |
| ## Update MiruPlay | |
| From the MiruPlay root: | |
| ```bash | |
| git submodule update --remote --recursive tools/anime_parser | |
| git add tools/anime_parser | |
| git commit -m "Update AniFileBERT submodule" | |
| git push origin master | |
| ``` | |
| If a new ONNX export changed Android runtime assets, also stage: | |
| ```text | |
| scraper/src/main/assets/anime_parser/anime_filename_parser.onnx | |
| scraper/src/main/assets/anime_parser/config.json | |
| scraper/src/main/assets/anime_parser/vocab.json | |
| ``` | |