Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
| # Codex + Colab Training | |
| Free Colab cannot be used as an always-on remote machine. The practical setup is: | |
| 1. Open a Colab GPU runtime when you want to train. | |
| 2. Start the lightweight worker in one cell. | |
| 3. Give Codex the printed worker URL and token. | |
| 4. Codex submits jobs while that Colab session is alive. | |
| 5. Checkpoints and manifests stay on Google Drive, so the next session can resume. | |
| ## Start a Colab Session | |
| Run this in a Colab code cell: | |
| ```python | |
| from google.colab import drive | |
| drive.mount("/content/drive") | |
| !git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT /content/AniFileBERT || true | |
| %cd /content/AniFileBERT | |
| !git pull --ff-only || true | |
| !git submodule update --init --recursive | |
| !python -m tools.colab_worker | |
| ``` | |
| The cell prints: | |
| ```text | |
| COLAB_WORKER_URL=https://...trycloudflare.com | |
| COLAB_WORKER_TOKEN=... | |
| ``` | |
| Keep that cell running. If Colab disconnects, start it again; default profiles | |
| save every 1000 steps and resume from the latest Drive checkpoint because they | |
| use `checkpoint_steps: 1000` and `resume_from_checkpoint: "auto"`. | |
| ## Let Codex Submit a Job | |
| On the local machine: | |
| ```powershell | |
| $env:ANIFILEBERT_COLAB_URL="https://...trycloudflare.com" | |
| $env:ANIFILEBERT_COLAB_TOKEN="..." | |
| python -m tools.colab_client health | |
| python -m tools.colab_client submit --profile dmhy_regex_finetune --wait | |
| ``` | |
| Codex can run the same commands from this repository after you provide the URL | |
| and token. | |
| ## Profiles | |
| - `colab/configs/dmhy_regex_finetune.json`: default regex tokenizer fine-tune | |
| from the published root checkpoint. | |
| - `colab/configs/dmhy_char_train.json`: character tokenizer training from | |
| scratch. | |
| You can submit a local edited profile instead of a remote profile: | |
| ```powershell | |
| python -m tools.colab_client submit --config colab/configs/dmhy_regex_finetune.json --wait | |
| ``` | |
| The worker writes per-job logs under: | |
| ```text | |
| MyDrive/AniFileBERT/worker/jobs/<job-id>/ | |
| ``` | |
| The training runner writes: | |
| ```text | |
| MyDrive/AniFileBERT/checkpoints/<profile-name>/ | |
| MyDrive/AniFileBERT/last_run_manifest.json | |
| ``` | |