Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
Codex + Colab Training
Free Colab cannot be used as an always-on remote machine. The practical setup is:
- Open a Colab GPU runtime when you want to train.
- Start the lightweight worker in one cell.
- Give Codex the printed worker URL and token.
- Codex submits jobs while that Colab session is alive.
- Checkpoints and manifests stay on Google Drive, so the next session can resume.
Start a Colab Session
Run this in a Colab code cell:
from google.colab import drive
drive.mount("/content/drive")
!git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT /content/AniFileBERT || true
%cd /content/AniFileBERT
!git pull --ff-only || true
!git submodule update --init --recursive
!python -m tools.colab_worker
The cell prints:
COLAB_WORKER_URL=https://...trycloudflare.com
COLAB_WORKER_TOKEN=...
Keep that cell running. If Colab disconnects, start it again; default profiles
save every 1000 steps and resume from the latest Drive checkpoint because they
use checkpoint_steps: 1000 and resume_from_checkpoint: "auto".
Let Codex Submit a Job
On the local machine:
$env:ANIFILEBERT_COLAB_URL="https://...trycloudflare.com"
$env:ANIFILEBERT_COLAB_TOKEN="..."
python -m tools.colab_client health
python -m tools.colab_client submit --profile dmhy_regex_finetune --wait
Codex can run the same commands from this repository after you provide the URL and token.
Profiles
colab/configs/dmhy_regex_finetune.json: default regex tokenizer fine-tune from the published root checkpoint.colab/configs/dmhy_char_train.json: character tokenizer training from scratch.
You can submit a local edited profile instead of a remote profile:
python -m tools.colab_client submit --config colab/configs/dmhy_regex_finetune.json --wait
The worker writes per-job logs under:
MyDrive/AniFileBERT/worker/jobs/<job-id>/
The training runner writes:
MyDrive/AniFileBERT/checkpoints/<profile-name>/
MyDrive/AniFileBERT/last_run_manifest.json