SAGE UI — Colab Quickstart Guide
So you started the SAGE server in Google Colab, got your Ngrok link, and loaded the webpage. Welcome to the SAGE Browser IDE!
Right now, you are looking at the "Control Plane." The AI is essentially a blank slate. To get a "proper agent" that can chat with you, you need to use this interface to prepare data and train the model step-by-step.
Here is exactly what to do.
Step 1: Open the Terminal & Download Data
To train a model, you need text. We recently added a 5-Billion-Token downloader, but we don't need all 5 billion for a quick test.
- In the SAGE IDE, open the CLI Terminal (click the
>_icon on the left sidebar, or press `Ctrl + ``). - Type the following command and press Enter to download a small 1% slice (~50 Million tokens):
python debug/download_5b_tokens.py --output-dir data/raw --scale 0.01 - Watch the terminal. It will take a few minutes to download the General Web, Code, Math, Wikipedia, and Synthetic datasets into
data/raw/.
Step 2: Train Your Tokenizer
The AI doesn't read English words; it reads "Tokens". It needs to learn its vocabulary from your downloaded data.
- Click the Presets tab (the rocket icon 🚀) on the left sidebar.
- Select Tokenizer Train from the dropdown menu.
- Click the purple Run Job button.
- A new panel will slide out showing you the live logs. Wait until it says
Job finished successfully.
Step 3: Fast-Pack Your Data (Sharding)
Training directly from text files is too slow for a GPU. We need to tokenize the text and pack it into high-speed Parquet "shards".
- Go back to the Presets tab.
- Select Build Data Shards.
- Set the
shard_sizeto2048. - Click Run Job.
- Wait for the logs to finish. When done, your data is packed and ready!
Step 4: Begin Training the AI
Now it's time to put the GPU to work.
- Go back to the Presets tab.
- Select Training Run.
- You can leave the steps at the default (e.g.,
20for a smoke test, or change it to2000for a real micro-run). Make suredisable_wandbis checked so it doesn't ask for a Weights & Biases login. - Click Run Job.
- The live log viewer will now stream training metrics. You will see
lossgoing down andtokens_per_secondshowing how fast your Colab T4 GPU is churning through data. - The trainer automatically saves checkpoints (e.g.,
ckpt_step_1000.pt) into theruns/folder.
Step 5: Chat with Your New Agent
Once the training has run for a decent amount of steps and a checkpoint is saved, the model is ready to talk!
- Click the Chat tab (the speech bubble icon 💬) on the left sidebar.
- Type a message like "What is Python?" and hit Enter.
- The UI will send this prompt to the backend, run inference using your newly trained checkpoint and tokenizer, and stream the generated response back to your screen.
(Note: If you only trained for 20 steps, the AI will probably respond with random gibberish. Real reasoning requires thousands of steps over billions of tokens!)
Connect your models with ngrok for public IP
~ Do not forgot to set up and ngrok authentication token with this!
# Colab one-cell launcher for the real SAGE server
# Before running:
# 1. In Colab, open the Secrets panel (Key icon on the left) and add your NGROK_AUTHTOKEN
# 2. If you want /generate, switch Colab to a T4 GPU runtime
import os
import sys
import time
import atexit
import subprocess
import importlib
import secrets
from pathlib import Path
REPO_URL = "https://huggingface.co/sage002/sage"
REPO_DIR = Path("/content/sage")
PORT = 8000
RUN_GENERATE_SMOKE = False
def run(cmd, cwd=None):
print("+", " ".join(cmd))
subprocess.run(cmd, cwd=cwd, check=True)
# 1. Clone or update repo
if not REPO_DIR.exists():
run(["git", "clone", REPO_URL, str(REPO_DIR)])
else:
run(["git", "-C", str(REPO_DIR), "pull", "--ff-only"])
# 2. Install dependencies
run([sys.executable, "-m", "pip", "install", "-q", "-U", "pip"])
run([
sys.executable, "-m", "pip", "install", "-q",
"fastapi>=0.110.0", "uvicorn>=0.29.0", "python-multipart>=0.0.9",
"pydantic>=2.7.0", "pyyaml>=6.0.1", "psutil>=5.9.8",
"pyngrok>=7.2.0", "requests>=2.31.0"
])
try:
import torch
except ImportError:
run([sys.executable, "-m", "pip", "install", "-q", "torch>=2.1.0"])
import torch
# Refresh path caches so the cell can instantly import newly installed modules
importlib.invalidate_caches()
import requests
from pyngrok import ngrok
# 3. Retrieve Ngrok token securely via Colab Secrets (or fallback to environment variable)
try:
from google.colab import userdata
NGROK_AUTHTOKEN = userdata.get("NGROK_AUTHTOKEN")
except Exception:
NGROK_AUTHTOKEN = os.environ.get("NGROK_AUTHTOKEN")
if not NGROK_AUTHTOKEN:
raise ValueError("Missing NGROK_AUTHTOKEN. Please add it to your Colab Secrets.")
# 4. Supply necessary SAGE environment variables for the server
env = os.environ.copy()
env["SAGE_WEB_PASSWORD"] = env.get("SAGE_WEB_PASSWORD") or secrets.token_urlsafe(12)
env["SAGE_MODEL_CONFIG"] = env.get("SAGE_MODEL_CONFIG", "configs/model/1b.yaml")
env["SAGE_CHECKPOINT_DIR"] = env.get("SAGE_CHECKPOINT_DIR", "runs/sage-1b")
env["SAGE_TOKENIZER_MODEL"] = env.get("SAGE_TOKENIZER_MODEL", "tokenizer/tokenizer.model")
USE_GPU_SERVER = torch.cuda.is_available()
APP_TARGET = "serve.server:app" if USE_GPU_SERVER else "serve.server_cpu:app"
print(f"GPU available: {USE_GPU_SERVER}")
print(f"Starting app target: {APP_TARGET}")
print(f"SAGE_WEB_PASSWORD: {env['SAGE_WEB_PASSWORD']} <-- Use this to login to the IDE")
# 5. Start Uvicorn Server attached to the log file via Popen
log_path = REPO_DIR / "uvicorn.log"
log_file = open(log_path, "w", encoding="utf-8")
server_proc = subprocess.Popen(
[
sys.executable, "-m", "uvicorn",
APP_TARGET,
"--host", "0.0.0.0",
"--port", str(PORT),
],
cwd=str(REPO_DIR),
env=env, # Required: Passes the SAGE environment variables to Uvicorn
stdout=log_file,
stderr=subprocess.STDOUT,
)
def cleanup():
global server_proc, log_file
print("Cleaning up...")
try:
ngrok.disconnect(public_url)
ngrok.kill()
except Exception:
pass
if server_proc and server_proc.poll() is None:
server_proc.terminate()
try:
server_proc.wait(timeout=10)
except subprocess.TimeoutExpired:
server_proc.kill()
try:
log_file.close()
except Exception:
pass
print("Cleanup complete.")
atexit.register(cleanup)
# 6. Wait for health check success
health_url = f"http://127.0.0.1:{PORT}/health"
for _ in range(60):
if server_proc.poll() is not None:
log_file.flush()
raise RuntimeError("Uvicorn exited early.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))
try:
r = requests.get(health_url, timeout=2)
if r.ok:
print("Local health OK:", r.json())
break
except Exception:
pass
time.sleep(2)
else:
log_file.flush()
raise TimeoutError("Server did not become healthy.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))
# 7. Start Ngrok HTTPs Tunnel
try:
ngrok.kill()
ngrok.set_auth_token(NGROK_AUTHTOKEN)
tunnel = ngrok.connect(addr=PORT, proto="http", bind_tls=True) # Forces HTTPS UI which stops browser mixed-content blocks
public_url = tunnel.public_url
print("\n============================================")
print(" SAGE DASHBOARD ")
print("==============================================")
print(f"URL: {public_url}")
print(f"PWD: {env['SAGE_WEB_PASSWORD']}")
print("==============================================\n")
if USE_GPU_SERVER:
print("Generate :", f"{public_url}/generate")
else:
print("Wait: Generate is not available on CPU server in this repo")
print("Switch Colab to a GPU runtime if you want /generate.")
except Exception as e:
print("Could not start Ngrok: ", e)
# Optional /generate smoke test
if USE_GPU_SERVER and RUN_GENERATE_SMOKE:
print("\nRunning /generate smoke test...")
try:
resp = requests.post(
f"http://127.0.0.1:{PORT}/generate",
json={"input_ids": [1, 42, 99], "max_new_tokens": 4},
timeout=300,
)
print("Generate response:", resp.json())
except Exception as e:
print("Generate timeout or failure:", e)
print(f"\nServer log path: {log_path}")
print("The server will continuously run until you stop the Code Cell manually.")
Pro-Tips for the IDE
- Command Palette: Press
Ctrl + Kanywhere to quickly jump between tools. - Function Inspector: You can click the Book 📖 icon on the right to browse the actual Python codebase from within the browser while your model trains.
- Stop a stray training job: Go to the Jobs panel (the clipboard icon) and click the red "Stop" button on any running task to free up your GPU.