# AniFileBERT Maintenance This repository is the standalone Hugging Face model repo used by MiruPlay as `tools/anime_parser`. ## Related Repositories | Repository | URL | Purpose | |------------|-----|---------| | AniFileBERT | `https://huggingface.co/ModerRAS/AniFileBERT` | Model, training scripts, ONNX export | | AnimeName | `https://huggingface.co/datasets/ModerRAS/AnimeName` | Training datasets and manifests | | MiruPlay | `https://github.com/ModerRAS/MiruPlay` | Android app and runtime integration | Nested structure: ```text AniFileBERT datasets/AnimeName -> ModerRAS/AnimeName ``` ## Clone ```bash git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT ``` After a normal clone: ```bash git submodule update --init --recursive ``` ## Dataset Waterline Current DMHY snapshot: ```text last_file_id: 689304 next_min_id: 689305 labeled_samples: 263042 mixed_train_samples: 363042 ``` The authoritative dataset files live in `datasets/AnimeName`. ## Train ```bash python -m pip install -r requirements.txt python train.py \ --data-file datasets/AnimeName/mixed_train.jsonl \ --vocab-file datasets/AnimeName/vocab.json \ --save-dir checkpoints/dmhy-finetune \ --init-model-dir . \ --epochs 1 \ --batch-size 128 \ --learning-rate 0.0003 \ --warmup-steps 300 \ --seed 42 ``` ## Publish a New Checkpoint Copy the final checkpoint to the repository root: ```powershell Copy-Item checkpoints/dmhy-finetune/final/config.json . -Force Copy-Item checkpoints/dmhy-finetune/final/model.safetensors . -Force Copy-Item checkpoints/dmhy-finetune/final/tokenizer_config.json . -Force Copy-Item checkpoints/dmhy-finetune/final/training_args.bin . -Force Copy-Item checkpoints/dmhy-finetune/final/vocab.json . -Force ``` Then commit and push: ```bash git add . git commit -m "Update AniFileBERT checkpoint" git push origin main ``` ## Update the Dataset Submodule After pushing new files to `ModerRAS/AnimeName`, update the nested pointer: ```bash git submodule update --remote datasets/AnimeName git add datasets/AnimeName git commit -m "Update AnimeName dataset pointer" git push origin main ``` ## Update MiruPlay From the MiruPlay root: ```bash git submodule update --remote --recursive tools/anime_parser git add tools/anime_parser git commit -m "Update AniFileBERT submodule" git push origin master ``` If a new ONNX export changed Android runtime assets, also stage: ```text scraper/src/main/assets/anime_parser/anime_filename_parser.onnx scraper/src/main/assets/anime_parser/config.json scraper/src/main/assets/anime_parser/vocab.json ```