agentic-intent-classifier / COLAB_SETUP.md
manikumargouni's picture
Upload folder using huggingface_hub
37d98fb verified

Google Colab setup — agentic-intent-classifier

1. Runtime

Runtime → Change runtime type → GPU (T4/L4/A100). Then verify:

import torch
print(torch.cuda.is_available(), torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU")

2. Get the code

Option A — clone (if the repo is public or you use a token):

!git clone <YOUR_REPO_URL> protocol
%cd protocol/agentic-intent-classifier

Option B — upload: Zip agentic-intent-classifier/ (including data/, examples/, taxonomy TSV under data/iab-content/ if you use IAB), unzip in Colab, then:

%cd /content/agentic-intent-classifier

3. Install dependencies

%pip install -q -r requirements.txt

If you see Torch version conflicts like:

  • torchvision ... requires torch==2.10.0, but you have torch 2.11.0

Pin matching versions (then restart the runtime):

%pip install -q -U torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0

If requirements.txt is missing, install manually:

%pip install -q torch transformers datasets accelerate scikit-learn numpy pandas safetensors

4. Optional: quieter TensorFlow / XLA logs

Run before importing combined_inference or anything that pulls TensorFlow:

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
os.environ["ABSL_MIN_LOG_LEVEL"] = "3"

Harmless CUDA “already registered” lines may still appear; they do not mean training failed.

5. Optional: persist artifacts on Google Drive

from google.colab import drive
drive.mount("/content/drive")

Copy outputs to Drive after training, or symlink multitask_intent_model_output / artifacts / iab_classifier_model_output to a Drive folder.

6. Full pipeline (train + IAB + calibrate + verify + ONNX + smoke test)

From agentic-intent-classifier/:

!python training/run_full_training_pipeline.py --skip-full-eval --complete
  • --skip-full-eval avoids the heaviest eval pass (OOM on small RAM); remove when you have headroom.
  • --complete = export multitask ONNX + pipeline_verify.py + one combined_inference query.

Artifacts-only check (after copying weights in):

!python training/pipeline_verify.py

Single query:

!python combined_inference.py "Which laptop should I buy for college?"

Check meta.iab_mapping_is_placeholder: false only if IAB was trained and calibration exists.

7. Minimal path (intent multitask + calibrate only)

If you only run multitask training and calibration in Colab (no full orchestrator):

python training/train_multitask_intent.py
python training/calibrate_confidence.py --head intent_type
python training/calibrate_confidence.py --head intent_subtype
python training/calibrate_confidence.py --head decision_phase

Production “complete” stack still needs IAB train + IAB calibrate (see run_full_training_pipeline.py).

8. Working directory

Always cd to the folder that contains config.py, training/, and data/:

import os
assert os.path.isfile("config.py"), "Wrong directory — cd into agentic-intent-classifier"