Token Classification
Transformers
ONNX
Safetensors
English
Japanese
Chinese
bert
anime
filename-parsing
Eval Results (legacy)
Instructions to use ModerRAS/AniFileBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModerRAS/AniFileBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ModerRAS/AniFileBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ModerRAS/AniFileBERT") model = AutoModelForTokenClassification.from_pretrained("ModerRAS/AniFileBERT") - Notebooks
- Google Colab
- Kaggle
File size: 2,058 Bytes
e458112 8c50d16 e458112 8c50d16 e458112 8c50d16 e458112 8c50d16 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | # Codex + Colab Training
Free Colab cannot be used as an always-on remote machine. The practical setup is:
1. Open a Colab GPU runtime when you want to train.
2. Start the lightweight worker in one cell.
3. Give Codex the printed worker URL and token.
4. Codex submits jobs while that Colab session is alive.
5. Checkpoints and manifests stay on Google Drive, so the next session can resume.
## Start a Colab Session
Run this in a Colab code cell:
```python
from google.colab import drive
drive.mount("/content/drive")
!git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT /content/AniFileBERT || true
%cd /content/AniFileBERT
!git pull --ff-only || true
!git submodule update --init --recursive
!python -m tools.colab_worker
```
The cell prints:
```text
COLAB_WORKER_URL=https://...trycloudflare.com
COLAB_WORKER_TOKEN=...
```
Keep that cell running. If Colab disconnects, start it again; default profiles
save every 1000 steps and resume from the latest Drive checkpoint because they
use `checkpoint_steps: 1000` and `resume_from_checkpoint: "auto"`.
## Let Codex Submit a Job
On the local machine:
```powershell
$env:ANIFILEBERT_COLAB_URL="https://...trycloudflare.com"
$env:ANIFILEBERT_COLAB_TOKEN="..."
python -m tools.colab_client health
python -m tools.colab_client submit --profile dmhy_regex_finetune --wait
```
Codex can run the same commands from this repository after you provide the URL
and token.
## Profiles
- `colab/configs/dmhy_regex_finetune.json`: default regex tokenizer fine-tune
from the published root checkpoint.
- `colab/configs/dmhy_char_train.json`: character tokenizer training from
scratch.
You can submit a local edited profile instead of a remote profile:
```powershell
python -m tools.colab_client submit --config colab/configs/dmhy_regex_finetune.json --wait
```
The worker writes per-job logs under:
```text
MyDrive/AniFileBERT/worker/jobs/<job-id>/
```
The training runner writes:
```text
MyDrive/AniFileBERT/checkpoints/<profile-name>/
MyDrive/AniFileBERT/last_run_manifest.json
```
|