# SAGE UI — Colab Quickstart Guide So you started the SAGE server in Google Colab, got your Ngrok link, and loaded the webpage. Welcome to the **SAGE Browser IDE**! Right now, you are looking at the "Control Plane." The AI is essentially a blank slate. To get a "proper agent" that can chat with you, you need to use this interface to prepare data and train the model step-by-step. Here is exactly what to do. --- ### Step 1: Open the Terminal & Download Data To train a model, you need text. We recently added a 5-Billion-Token downloader, but we don't need all 5 billion for a quick test. 1. In the SAGE IDE, open the **CLI Terminal** (click the `>_` icon on the left sidebar, or press `Ctrl + \``). 2. Type the following command and press Enter to download a small 1% slice (~50 Million tokens): ```bash python debug/download_5b_tokens.py --output-dir data/raw --scale 0.01 ``` 3. Watch the terminal. It will take a few minutes to download the General Web, Code, Math, Wikipedia, and Synthetic datasets into `data/raw/`. ### Step 2: Train Your Tokenizer The AI doesn't read English words; it reads "Tokens". It needs to learn its vocabulary from your downloaded data. 1. Click the **Presets** tab (the rocket icon 🚀) on the left sidebar. 2. Select **Tokenizer Train** from the dropdown menu. 3. Click the purple **Run Job** button. 4. A new panel will slide out showing you the live logs. Wait until it says `Job finished successfully`. ### Step 3: Fast-Pack Your Data (Sharding) Training directly from text files is too slow for a GPU. We need to tokenize the text and pack it into high-speed Parquet "shards". 1. Go back to the **Presets** tab. 2. Select **Build Data Shards**. 3. Set the `shard_size` to `2048`. 4. Click **Run Job**. 5. Wait for the logs to finish. When done, your data is packed and ready! ### Step 4: Begin Training the AI Now it's time to put the GPU to work. 1. Go back to the **Presets** tab. 2. Select **Training Run**. 3. You can leave the steps at the default (e.g., `20` for a smoke test, or change it to `2000` for a real micro-run). Make sure `disable_wandb` is checked so it doesn't ask for a Weights & Biases login. 4. Click **Run Job**. 5. The live log viewer will now stream training metrics. You will see `loss` going down and `tokens_per_second` showing how fast your Colab T4 GPU is churning through data. 6. The trainer automatically saves checkpoints (e.g., `ckpt_step_1000.pt`) into the `runs/` folder. ### Step 5: Chat with Your New Agent Once the training has run for a decent amount of steps and a checkpoint is saved, the model is ready to talk! 1. Click the **Chat** tab (the speech bubble icon 💬) on the left sidebar. 2. Type a message like _"What is Python?"_ and hit Enter. 3. The UI will send this prompt to the backend, run inference using your newly trained checkpoint and tokenizer, and stream the generated response back to your screen. _(Note: If you only trained for 20 steps, the AI will probably respond with random gibberish. Real reasoning requires thousands of steps over billions of tokens!)_ --- # Connect your models with ngrok for public IP ~ Do not forgot to set up and ngrok authentication token with this! ```bash # Colab one-cell launcher for the real SAGE server # Before running: # 1. In Colab, open the Secrets panel (Key icon on the left) and add your NGROK_AUTHTOKEN # 2. If you want /generate, switch Colab to a T4 GPU runtime import os import sys import time import atexit import subprocess import importlib import secrets from pathlib import Path REPO_URL = "https://huggingface.co/sage002/sage" REPO_DIR = Path("/content/sage") PORT = 8000 RUN_GENERATE_SMOKE = False def run(cmd, cwd=None): print("+", " ".join(cmd)) subprocess.run(cmd, cwd=cwd, check=True) # 1. Clone or update repo if not REPO_DIR.exists(): run(["git", "clone", REPO_URL, str(REPO_DIR)]) else: run(["git", "-C", str(REPO_DIR), "pull", "--ff-only"]) # 2. Install dependencies run([sys.executable, "-m", "pip", "install", "-q", "-U", "pip"]) run([ sys.executable, "-m", "pip", "install", "-q", "fastapi>=0.110.0", "uvicorn>=0.29.0", "python-multipart>=0.0.9", "pydantic>=2.7.0", "pyyaml>=6.0.1", "psutil>=5.9.8", "pyngrok>=7.2.0", "requests>=2.31.0" ]) try: import torch except ImportError: run([sys.executable, "-m", "pip", "install", "-q", "torch>=2.1.0"]) import torch # Refresh path caches so the cell can instantly import newly installed modules importlib.invalidate_caches() import requests from pyngrok import ngrok # 3. Retrieve Ngrok token securely via Colab Secrets (or fallback to environment variable) try: from google.colab import userdata NGROK_AUTHTOKEN = userdata.get("NGROK_AUTHTOKEN") except Exception: NGROK_AUTHTOKEN = os.environ.get("NGROK_AUTHTOKEN") if not NGROK_AUTHTOKEN: raise ValueError("Missing NGROK_AUTHTOKEN. Please add it to your Colab Secrets.") # 4. Supply necessary SAGE environment variables for the server env = os.environ.copy() env["SAGE_WEB_PASSWORD"] = env.get("SAGE_WEB_PASSWORD") or secrets.token_urlsafe(12) env["SAGE_MODEL_CONFIG"] = env.get("SAGE_MODEL_CONFIG", "configs/model/1b.yaml") env["SAGE_CHECKPOINT_DIR"] = env.get("SAGE_CHECKPOINT_DIR", "runs/sage-1b") env["SAGE_TOKENIZER_MODEL"] = env.get("SAGE_TOKENIZER_MODEL", "tokenizer/tokenizer.model") USE_GPU_SERVER = torch.cuda.is_available() APP_TARGET = "serve.server:app" if USE_GPU_SERVER else "serve.server_cpu:app" print(f"GPU available: {USE_GPU_SERVER}") print(f"Starting app target: {APP_TARGET}") print(f"SAGE_WEB_PASSWORD: {env['SAGE_WEB_PASSWORD']} <-- Use this to login to the IDE") # 5. Start Uvicorn Server attached to the log file via Popen log_path = REPO_DIR / "uvicorn.log" log_file = open(log_path, "w", encoding="utf-8") server_proc = subprocess.Popen( [ sys.executable, "-m", "uvicorn", APP_TARGET, "--host", "0.0.0.0", "--port", str(PORT), ], cwd=str(REPO_DIR), env=env, # Required: Passes the SAGE environment variables to Uvicorn stdout=log_file, stderr=subprocess.STDOUT, ) def cleanup(): global server_proc, log_file print("Cleaning up...") try: ngrok.disconnect(public_url) ngrok.kill() except Exception: pass if server_proc and server_proc.poll() is None: server_proc.terminate() try: server_proc.wait(timeout=10) except subprocess.TimeoutExpired: server_proc.kill() try: log_file.close() except Exception: pass print("Cleanup complete.") atexit.register(cleanup) # 6. Wait for health check success health_url = f"http://127.0.0.1:{PORT}/health" for _ in range(60): if server_proc.poll() is not None: log_file.flush() raise RuntimeError("Uvicorn exited early.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore")) try: r = requests.get(health_url, timeout=2) if r.ok: print("Local health OK:", r.json()) break except Exception: pass time.sleep(2) else: log_file.flush() raise TimeoutError("Server did not become healthy.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore")) # 7. Start Ngrok HTTPs Tunnel try: ngrok.kill() ngrok.set_auth_token(NGROK_AUTHTOKEN) tunnel = ngrok.connect(addr=PORT, proto="http", bind_tls=True) # Forces HTTPS UI which stops browser mixed-content blocks public_url = tunnel.public_url print("\n============================================") print(" SAGE DASHBOARD ") print("==============================================") print(f"URL: {public_url}") print(f"PWD: {env['SAGE_WEB_PASSWORD']}") print("==============================================\n") if USE_GPU_SERVER: print("Generate :", f"{public_url}/generate") else: print("Wait: Generate is not available on CPU server in this repo") print("Switch Colab to a GPU runtime if you want /generate.") except Exception as e: print("Could not start Ngrok: ", e) # Optional /generate smoke test if USE_GPU_SERVER and RUN_GENERATE_SMOKE: print("\nRunning /generate smoke test...") try: resp = requests.post( f"http://127.0.0.1:{PORT}/generate", json={"input_ids": [1, 42, 99], "max_new_tokens": 4}, timeout=300, ) print("Generate response:", resp.json()) except Exception as e: print("Generate timeout or failure:", e) print(f"\nServer log path: {log_path}") print("The server will continuously run until you stop the Code Cell manually.") ``` --- ### Pro-Tips for the IDE - **Command Palette:** Press `Ctrl + K` anywhere to quickly jump between tools. - **Function Inspector:** You can click the Book 📖 icon on the right to browse the actual Python codebase from within the browser while your model trains. - **Stop a stray training job:** Go to the **Jobs** panel (the clipboard icon) and click the red "Stop" button on any running task to free up your GPU.