Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
| # AniFileBERT encoded dataset cache | |
| Builds split train/eval `.npy` shard caches for `anifilebert.train`. | |
| The tool mirrors the Python char-tokenizer training encoder for JSONL rows with | |
| `filename`, `tokens`, and `labels`, including projection from source tokens to | |
| character labels and the structural media-label repairs used by training. | |
| Example: | |
| ```powershell | |
| cargo run --release --manifest-path tools\encoded_dataset_cache\Cargo.toml -- ` | |
| --input data\schema_v2_hard_focus_char_seed63.jsonl ` | |
| --vocab-file datasets\AnimeName\vocab.char.json ` | |
| --label-schema-file label_schema.json ` | |
| --output-dir data\encoded_cache\schema_v2_hard_focus_char_seed63 ` | |
| --max-length 128 ` | |
| --train-split 0.95 ` | |
| --seed 63 ` | |
| --shard-size 25000 ` | |
| --threads 16 | |
| ``` | |
| Use the cache in training: | |
| ```powershell | |
| .\.venv\Scripts\python.exe -m anifilebert.train ` | |
| --tokenizer char ` | |
| --data-file data\schema_v2_hard_focus_char_seed63.jsonl ` | |
| --vocab-file datasets\AnimeName\vocab.char.json ` | |
| --encoded-cache-dir data\encoded_cache\schema_v2_hard_focus_char_seed63 ` | |
| --max-seq-length 128 | |
| ``` | |