ModerRAS
/

AniFileBERT

Token Classification

filename-parsing

Eval Results (legacy)

Model card Files Files and versions

AniFileBERT / colab /README.md

ModerRAS's picture

Organize parser modules and tools

8c50d16 3 days ago

|

history blame contribute delete

2.06 kB

	# Codex + Colab Training

	Free Colab cannot be used as an always-on remote machine. The practical setup is:

	1. Open a Colab GPU runtime when you want to train.
	2. Start the lightweight worker in one cell.
	3. Give Codex the printed worker URL and token.
	4. Codex submits jobs while that Colab session is alive.
	5. Checkpoints and manifests stay on Google Drive, so the next session can resume.

	## Start a Colab Session

	Run this in a Colab code cell:

	```python
	from google.colab import drive
	drive.mount("/content/drive")

	!git clone --recursive https://huggingface.co/ModerRAS/AniFileBERT /content/AniFileBERT \|\| true
	%cd /content/AniFileBERT
	!git pull --ff-only \|\| true
	!git submodule update --init --recursive
	!python -m tools.colab_worker
	```

	The cell prints:

	```text
	COLAB_WORKER_URL=https://...trycloudflare.com
	COLAB_WORKER_TOKEN=...
	```

	Keep that cell running. If Colab disconnects, start it again; default profiles
	save every 1000 steps and resume from the latest Drive checkpoint because they
	use `checkpoint_steps: 1000` and `resume_from_checkpoint: "auto"`.

	## Let Codex Submit a Job

	On the local machine:

	```powershell
	$env:ANIFILEBERT_COLAB_URL="https://...trycloudflare.com"
	$env:ANIFILEBERT_COLAB_TOKEN="..."
	python -m tools.colab_client health
	python -m tools.colab_client submit --profile dmhy_regex_finetune --wait
	```

	Codex can run the same commands from this repository after you provide the URL
	and token.

	## Profiles

	- `colab/configs/dmhy_regex_finetune.json`: default regex tokenizer fine-tune
	from the published root checkpoint.
	- `colab/configs/dmhy_char_train.json`: character tokenizer training from
	scratch.

	You can submit a local edited profile instead of a remote profile:

	```powershell
	python -m tools.colab_client submit --config colab/configs/dmhy_regex_finetune.json --wait
	```

	The worker writes per-job logs under:

	```text
	MyDrive/AniFileBERT/worker/jobs/<job-id>/
	```

	The training runner writes:

	```text
	MyDrive/AniFileBERT/checkpoints/<profile-name>/
	MyDrive/AniFileBERT/last_run_manifest.json
	```