Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
File size: 2,597 Bytes
3197202 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | # AniFileBERT Maintenance
This repository is the standalone Hugging Face model repo used by MiruPlay as
`tools/anime_parser`.
## Related Repositories
| Repository | URL | Purpose |
|------------|-----|---------|
| AniFileBERT | `https://huggingface.co/ModerRAS/AniFileBERT` | Model, training scripts, ONNX export |
| AnimeName | `https://huggingface.co/datasets/ModerRAS/AnimeName` | Training datasets and manifests |
| MiruPlay | `https://github.com/ModerRAS/MiruPlay` | Android app and runtime integration |
Nested structure:
```text
AniFileBERT
datasets/AnimeName -> ModerRAS/AnimeName
```
## Clone
```bash
git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT
```
After a normal clone:
```bash
git submodule update --init --recursive
```
## Dataset Waterline
Current DMHY snapshot:
```text
last_file_id: 689304
next_min_id: 689305
labeled_samples: 263042
mixed_train_samples: 363042
```
The authoritative dataset files live in `datasets/AnimeName`.
## Train
```bash
python -m pip install -r requirements.txt
python train.py \
--data-file datasets/AnimeName/mixed_train.jsonl \
--vocab-file datasets/AnimeName/vocab.json \
--save-dir checkpoints/dmhy-finetune \
--init-model-dir . \
--epochs 1 \
--batch-size 128 \
--learning-rate 0.0003 \
--warmup-steps 300 \
--seed 42
```
## Publish a New Checkpoint
Copy the final checkpoint to the repository root:
```powershell
Copy-Item checkpoints/dmhy-finetune/final/config.json . -Force
Copy-Item checkpoints/dmhy-finetune/final/model.safetensors . -Force
Copy-Item checkpoints/dmhy-finetune/final/tokenizer_config.json . -Force
Copy-Item checkpoints/dmhy-finetune/final/training_args.bin . -Force
Copy-Item checkpoints/dmhy-finetune/final/vocab.json . -Force
```
Then commit and push:
```bash
git add .
git commit -m "Update AniFileBERT checkpoint"
git push origin main
```
## Update the Dataset Submodule
After pushing new files to `ModerRAS/AnimeName`, update the nested pointer:
```bash
git submodule update --remote datasets/AnimeName
git add datasets/AnimeName
git commit -m "Update AnimeName dataset pointer"
git push origin main
```
## Update MiruPlay
From the MiruPlay root:
```bash
git submodule update --remote --recursive tools/anime_parser
git add tools/anime_parser
git commit -m "Update AniFileBERT submodule"
git push origin master
```
If a new ONNX export changed Android runtime assets, also stage:
```text
scraper/src/main/assets/anime_parser/anime_filename_parser.onnx
scraper/src/main/assets/anime_parser/config.json
scraper/src/main/assets/anime_parser/vocab.json
```
|