SAGE UI — Colab Quickstart Guide

So you started the SAGE server in Google Colab, got your Ngrok link, and loaded the webpage. Welcome to the SAGE Browser IDE!

Right now, you are looking at the "Control Plane." The AI is essentially a blank slate. To get a "proper agent" that can chat with you, you need to use this interface to prepare data and train the model step-by-step.

Here is exactly what to do.

Step 1: Open the Terminal & Download Data

To train a model, you need text. We recently added a 5-Billion-Token downloader, but we don't need all 5 billion for a quick test.

In the SAGE IDE, open the CLI Terminal (click the >_ icon on the left sidebar, or press `Ctrl + ``).
Type the following command and press Enter to download a small 1% slice (~50 Million tokens):
```
python debug/download_5b_tokens.py --output-dir data/raw --scale 0.01
```
Watch the terminal. It will take a few minutes to download the General Web, Code, Math, Wikipedia, and Synthetic datasets into data/raw/.

Step 2: Train Your Tokenizer

The AI doesn't read English words; it reads "Tokens". It needs to learn its vocabulary from your downloaded data.

Click the Presets tab (the rocket icon 🚀) on the left sidebar.
Select Tokenizer Train from the dropdown menu.
Click the purple Run Job button.
A new panel will slide out showing you the live logs. Wait until it says Job finished successfully.

Step 3: Fast-Pack Your Data (Sharding)

Training directly from text files is too slow for a GPU. We need to tokenize the text and pack it into high-speed Parquet "shards".

Go back to the Presets tab.
Select Build Data Shards.
Set the shard_size to 2048.
Click Run Job.
Wait for the logs to finish. When done, your data is packed and ready!

Step 4: Begin Training the AI

Now it's time to put the GPU to work.

Go back to the Presets tab.
Select Training Run.
You can leave the steps at the default (e.g., 20 for a smoke test, or change it to 2000 for a real micro-run). Make sure disable_wandb is checked so it doesn't ask for a Weights & Biases login.
Click Run Job.
The live log viewer will now stream training metrics. You will see loss going down and tokens_per_second showing how fast your Colab T4 GPU is churning through data.
The trainer automatically saves checkpoints (e.g., ckpt_step_1000.pt) into the runs/ folder.

Step 5: Chat with Your New Agent

Once the training has run for a decent amount of steps and a checkpoint is saved, the model is ready to talk!

Click the Chat tab (the speech bubble icon 💬) on the left sidebar.
Type a message like "What is Python?" and hit Enter.
The UI will send this prompt to the backend, run inference using your newly trained checkpoint and tokenizer, and stream the generated response back to your screen.

(Note: If you only trained for 20 steps, the AI will probably respond with random gibberish. Real reasoning requires thousands of steps over billions of tokens!)

Connect your models with ngrok for public IP

~ Do not forgot to set up and ngrok authentication token with this!

# Colab one-cell launcher for the real SAGE server
# Before running:
# 1. In Colab, open the Secrets panel (Key icon on the left) and add your NGROK_AUTHTOKEN
# 2. If you want /generate, switch Colab to a T4 GPU runtime

import os
import sys
import time
import atexit
import subprocess
import importlib
import secrets
from pathlib import Path

REPO_URL = "https://huggingface.co/sage002/sage"
REPO_DIR = Path("/content/sage")
PORT = 8000
RUN_GENERATE_SMOKE = False

def run(cmd, cwd=None):
    print("+", " ".join(cmd))
    subprocess.run(cmd, cwd=cwd, check=True)

# 1. Clone or update repo
if not REPO_DIR.exists():
    run(["git", "clone", REPO_URL, str(REPO_DIR)])
else:
    run(["git", "-C", str(REPO_DIR), "pull", "--ff-only"])

# 2. Install dependencies
run([sys.executable, "-m", "pip", "install", "-q", "-U", "pip"])
run([
    sys.executable, "-m", "pip", "install", "-q",
    "fastapi>=0.110.0", "uvicorn>=0.29.0", "python-multipart>=0.0.9",
    "pydantic>=2.7.0", "pyyaml>=6.0.1", "psutil>=5.9.8",
    "pyngrok>=7.2.0", "requests>=2.31.0"
])

try:
    import torch
except ImportError:
    run([sys.executable, "-m", "pip", "install", "-q", "torch>=2.1.0"])
    import torch

# Refresh path caches so the cell can instantly import newly installed modules
importlib.invalidate_caches()
import requests
from pyngrok import ngrok

# 3. Retrieve Ngrok token securely via Colab Secrets (or fallback to environment variable)
try:
    from google.colab import userdata
    NGROK_AUTHTOKEN = userdata.get("NGROK_AUTHTOKEN")
except Exception:
    NGROK_AUTHTOKEN = os.environ.get("NGROK_AUTHTOKEN")

if not NGROK_AUTHTOKEN:
    raise ValueError("Missing NGROK_AUTHTOKEN. Please add it to your Colab Secrets.")

# 4. Supply necessary SAGE environment variables for the server
env = os.environ.copy()
env["SAGE_WEB_PASSWORD"] = env.get("SAGE_WEB_PASSWORD") or secrets.token_urlsafe(12)
env["SAGE_MODEL_CONFIG"] = env.get("SAGE_MODEL_CONFIG", "configs/model/1b.yaml")
env["SAGE_CHECKPOINT_DIR"] = env.get("SAGE_CHECKPOINT_DIR", "runs/sage-1b")
env["SAGE_TOKENIZER_MODEL"] = env.get("SAGE_TOKENIZER_MODEL", "tokenizer/tokenizer.model")

USE_GPU_SERVER = torch.cuda.is_available()
APP_TARGET = "serve.server:app" if USE_GPU_SERVER else "serve.server_cpu:app"

print(f"GPU available: {USE_GPU_SERVER}")
print(f"Starting app target: {APP_TARGET}")
print(f"SAGE_WEB_PASSWORD: {env['SAGE_WEB_PASSWORD']}  <-- Use this to login to the IDE")

# 5. Start Uvicorn Server attached to the log file via Popen
log_path = REPO_DIR / "uvicorn.log"
log_file = open(log_path, "w", encoding="utf-8")

server_proc = subprocess.Popen(
    [
        sys.executable, "-m", "uvicorn",
        APP_TARGET,
        "--host", "0.0.0.0",
        "--port", str(PORT),
    ],
    cwd=str(REPO_DIR),
    env=env,                    # Required: Passes the SAGE environment variables to Uvicorn
    stdout=log_file,
    stderr=subprocess.STDOUT,
)

def cleanup():
    global server_proc, log_file
    print("Cleaning up...")
    try:
        ngrok.disconnect(public_url)
        ngrok.kill()
    except Exception:
        pass
    if server_proc and server_proc.poll() is None:
        server_proc.terminate()
        try:
            server_proc.wait(timeout=10)
        except subprocess.TimeoutExpired:
            server_proc.kill()
    try:
        log_file.close()
    except Exception:
        pass
    print("Cleanup complete.")

atexit.register(cleanup)

# 6. Wait for health check success
health_url = f"http://127.0.0.1:{PORT}/health"
for _ in range(60):
    if server_proc.poll() is not None:
        log_file.flush()
        raise RuntimeError("Uvicorn exited early.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))
    try:
        r = requests.get(health_url, timeout=2)
        if r.ok:
            print("Local health OK:", r.json())
            break
    except Exception:
        pass
    time.sleep(2)
else:
    log_file.flush()
    raise TimeoutError("Server did not become healthy.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))

# 7. Start Ngrok HTTPs Tunnel
try:
    ngrok.kill()
    ngrok.set_auth_token(NGROK_AUTHTOKEN)
    tunnel = ngrok.connect(addr=PORT, proto="http", bind_tls=True) # Forces HTTPS UI which stops browser mixed-content blocks
    public_url = tunnel.public_url

    print("\n============================================")
    print("        SAGE DASHBOARD        ")
    print("==============================================")
    print(f"URL: {public_url}")
    print(f"PWD: {env['SAGE_WEB_PASSWORD']}")
    print("==============================================\n")

    if USE_GPU_SERVER:
        print("Generate          :", f"{public_url}/generate")
    else:
        print("Wait: Generate is not available on CPU server in this repo")
        print("Switch Colab to a GPU runtime if you want /generate.")
except Exception as e:
    print("Could not start Ngrok: ", e)


# Optional /generate smoke test
if USE_GPU_SERVER and RUN_GENERATE_SMOKE:
    print("\nRunning /generate smoke test...")
    try:
        resp = requests.post(
            f"http://127.0.0.1:{PORT}/generate",
            json={"input_ids": [1, 42, 99], "max_new_tokens": 4},
            timeout=300,
        )
        print("Generate response:", resp.json())
    except Exception as e:
        print("Generate timeout or failure:", e)


print(f"\nServer log path: {log_path}")
print("The server will continuously run until you stop the Code Cell manually.")

Pro-Tips for the IDE

Command Palette: Press Ctrl + K anywhere to quickly jump between tools.
Function Inspector: You can click the Book 📖 icon on the right to browse the actual Python codebase from within the browser while your model trains.
Stop a stray training job: Go to the Jobs panel (the clipboard icon) and click the red "Stop" button on any running task to free up your GPU.