File size: 2,058 Bytes

# Codex + Colab Training

Free Colab cannot be used as an always-on remote machine. The practical setup is:

1. Open a Colab GPU runtime when you want to train.
2. Start the lightweight worker in one cell.
3. Give Codex the printed worker URL and token.
4. Codex submits jobs while that Colab session is alive.
5. Checkpoints and manifests stay on Google Drive, so the next session can resume.

## Start a Colab Session

Run this in a Colab code cell:

```python
from google.colab import drive
drive.mount("/content/drive")

!git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT /content/AniFileBERT || true
%cd /content/AniFileBERT
!git pull --ff-only || true
!git submodule update --init --recursive
!python -m tools.colab_worker
```

The cell prints:

```text
COLAB_WORKER_URL=https://...trycloudflare.com
COLAB_WORKER_TOKEN=...
```

Keep that cell running. If Colab disconnects, start it again; default profiles
save every 1000 steps and resume from the latest Drive checkpoint because they
use `checkpoint_steps: 1000` and `resume_from_checkpoint: "auto"`.

## Let Codex Submit a Job

On the local machine:

```powershell
$env:ANIFILEBERT_COLAB_URL="https://...trycloudflare.com"
$env:ANIFILEBERT_COLAB_TOKEN="..."
python -m tools.colab_client health
python -m tools.colab_client submit --profile dmhy_regex_finetune --wait
```

Codex can run the same commands from this repository after you provide the URL
and token.

## Profiles

- `colab/configs/dmhy_regex_finetune.json`: default regex tokenizer fine-tune
  from the published root checkpoint.
- `colab/configs/dmhy_char_train.json`: character tokenizer training from
  scratch.

You can submit a local edited profile instead of a remote profile:

```powershell
python -m tools.colab_client submit --config colab/configs/dmhy_regex_finetune.json --wait
```

The worker writes per-job logs under:

```text
MyDrive/AniFileBERT/worker/jobs/<job-id>/
```

The training runner writes:

```text
MyDrive/AniFileBERT/checkpoints/<profile-name>/
MyDrive/AniFileBERT/last_run_manifest.json
```