sage / docs /Colab_Guide.md

SAGE model repository : Updating some model checkpoints

773566b verified about 1 month ago

9.12 kB

	# SAGE UI — Colab Quickstart Guide

	So you started the SAGE server in Google Colab, got your Ngrok link, and loaded the webpage. Welcome to the SAGE Browser IDE!

	Right now, you are looking at the "Control Plane." The AI is essentially a blank slate. To get a "proper agent" that can chat with you, you need to use this interface to prepare data and train the model step-by-step.

	Here is exactly what to do.

	---

	### Step 1: Open the Terminal & Download Data

	To train a model, you need text. We recently added a 5-Billion-Token downloader, but we don't need all 5 billion for a quick test.

	1. In the SAGE IDE, open the CLI Terminal (click the `>_` icon on the left sidebar, or press `Ctrl + \``).
	2. Type the following command and press Enter to download a small 1% slice (~50 Million tokens):
	```bash
	python debug/download_5b_tokens.py --output-dir data/raw --scale 0.01
	```
	3. Watch the terminal. It will take a few minutes to download the General Web, Code, Math, Wikipedia, and Synthetic datasets into `data/raw/`.

	### Step 2: Train Your Tokenizer

	The AI doesn't read English words; it reads "Tokens". It needs to learn its vocabulary from your downloaded data.

	1. Click the Presets tab (the rocket icon 🚀) on the left sidebar.
	2. Select Tokenizer Train from the dropdown menu.
	3. Click the purple Run Job button.
	4. A new panel will slide out showing you the live logs. Wait until it says `Job finished successfully`.

	### Step 3: Fast-Pack Your Data (Sharding)

	Training directly from text files is too slow for a GPU. We need to tokenize the text and pack it into high-speed Parquet "shards".

	1. Go back to the Presets tab.
	2. Select Build Data Shards.
	3. Set the `shard_size` to `2048`.
	4. Click Run Job.
	5. Wait for the logs to finish. When done, your data is packed and ready!

	### Step 4: Begin Training the AI

	Now it's time to put the GPU to work.

	1. Go back to the Presets tab.
	2. Select Training Run.
	3. You can leave the steps at the default (e.g., `20` for a smoke test, or change it to `2000` for a real micro-run). Make sure `disable_wandb` is checked so it doesn't ask for a Weights & Biases login.
	4. Click Run Job.
	5. The live log viewer will now stream training metrics. You will see `loss` going down and `tokens_per_second` showing how fast your Colab T4 GPU is churning through data.
	6. The trainer automatically saves checkpoints (e.g., `ckpt_step_1000.pt`) into the `runs/` folder.

	### Step 5: Chat with Your New Agent

	Once the training has run for a decent amount of steps and a checkpoint is saved, the model is ready to talk!

	1. Click the Chat tab (the speech bubble icon 💬) on the left sidebar.
	2. Type a message like _"What is Python?"_ and hit Enter.
	3. The UI will send this prompt to the backend, run inference using your newly trained checkpoint and tokenizer, and stream the generated response back to your screen.

	_(Note: If you only trained for 20 steps, the AI will probably respond with random gibberish. Real reasoning requires thousands of steps over billions of tokens!)_

	---
	# Connect your models with ngrok for public IP
	~ Do not forgot to set up and ngrok authentication token with this!


	```bash
	# Colab one-cell launcher for the real SAGE server
	# Before running:
	# 1. In Colab, open the Secrets panel (Key icon on the left) and add your NGROK_AUTHTOKEN
	# 2. If you want /generate, switch Colab to a T4 GPU runtime

	import os
	import sys
	import time
	import atexit
	import subprocess
	import importlib
	import secrets
	from pathlib import Path

	REPO_URL = "https://huggingface.co/sage002/sage"
	REPO_DIR = Path("/content/sage")
	PORT = 8000
	RUN_GENERATE_SMOKE = False

	def run(cmd, cwd=None):
	print("+", " ".join(cmd))
	subprocess.run(cmd, cwd=cwd, check=True)

	# 1. Clone or update repo
	if not REPO_DIR.exists():
	run(["git", "clone", REPO_URL, str(REPO_DIR)])
	else:
	run(["git", "-C", str(REPO_DIR), "pull", "--ff-only"])

	# 2. Install dependencies
	run([sys.executable, "-m", "pip", "install", "-q", "-U", "pip"])
	run([
	sys.executable, "-m", "pip", "install", "-q",
	"fastapi>=0.110.0", "uvicorn>=0.29.0", "python-multipart>=0.0.9",
	"pydantic>=2.7.0", "pyyaml>=6.0.1", "psutil>=5.9.8",
	"pyngrok>=7.2.0", "requests>=2.31.0"
	])

	try:
	import torch
	except ImportError:
	run([sys.executable, "-m", "pip", "install", "-q", "torch>=2.1.0"])
	import torch

	# Refresh path caches so the cell can instantly import newly installed modules
	importlib.invalidate_caches()
	import requests
	from pyngrok import ngrok

	# 3. Retrieve Ngrok token securely via Colab Secrets (or fallback to environment variable)
	try:
	from google.colab import userdata
	NGROK_AUTHTOKEN = userdata.get("NGROK_AUTHTOKEN")
	except Exception:
	NGROK_AUTHTOKEN = os.environ.get("NGROK_AUTHTOKEN")

	if not NGROK_AUTHTOKEN:
	raise ValueError("Missing NGROK_AUTHTOKEN. Please add it to your Colab Secrets.")

	# 4. Supply necessary SAGE environment variables for the server
	env = os.environ.copy()
	env["SAGE_WEB_PASSWORD"] = env.get("SAGE_WEB_PASSWORD") or secrets.token_urlsafe(12)
	env["SAGE_MODEL_CONFIG"] = env.get("SAGE_MODEL_CONFIG", "configs/model/1b.yaml")
	env["SAGE_CHECKPOINT_DIR"] = env.get("SAGE_CHECKPOINT_DIR", "runs/sage-1b")
	env["SAGE_TOKENIZER_MODEL"] = env.get("SAGE_TOKENIZER_MODEL", "tokenizer/tokenizer.model")

	USE_GPU_SERVER = torch.cuda.is_available()
	APP_TARGET = "serve.server:app" if USE_GPU_SERVER else "serve.server_cpu:app"

	print(f"GPU available: {USE_GPU_SERVER}")
	print(f"Starting app target: {APP_TARGET}")
	print(f"SAGE_WEB_PASSWORD: {env['SAGE_WEB_PASSWORD']} <-- Use this to login to the IDE")

	# 5. Start Uvicorn Server attached to the log file via Popen
	log_path = REPO_DIR / "uvicorn.log"
	log_file = open(log_path, "w", encoding="utf-8")

	server_proc = subprocess.Popen(
	[
	sys.executable, "-m", "uvicorn",
	APP_TARGET,
	"--host", "0.0.0.0",
	"--port", str(PORT),
	],
	cwd=str(REPO_DIR),
	env=env, # Required: Passes the SAGE environment variables to Uvicorn
	stdout=log_file,
	stderr=subprocess.STDOUT,
	)

	def cleanup():
	global server_proc, log_file
	print("Cleaning up...")
	try:
	ngrok.disconnect(public_url)
	ngrok.kill()
	except Exception:
	pass
	if server_proc and server_proc.poll() is None:
	server_proc.terminate()
	try:
	server_proc.wait(timeout=10)
	except subprocess.TimeoutExpired:
	server_proc.kill()
	try:
	log_file.close()
	except Exception:
	pass
	print("Cleanup complete.")

	atexit.register(cleanup)

	# 6. Wait for health check success
	health_url = f"http://127.0.0.1:{PORT}/health"
	for _ in range(60):
	if server_proc.poll() is not None:
	log_file.flush()
	raise RuntimeError("Uvicorn exited early.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))
	try:
	r = requests.get(health_url, timeout=2)
	if r.ok:
	print("Local health OK:", r.json())
	break
	except Exception:
	pass
	time.sleep(2)
	else:
	log_file.flush()
	raise TimeoutError("Server did not become healthy.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))

	# 7. Start Ngrok HTTPs Tunnel
	try:
	ngrok.kill()
	ngrok.set_auth_token(NGROK_AUTHTOKEN)
	tunnel = ngrok.connect(addr=PORT, proto="http", bind_tls=True) # Forces HTTPS UI which stops browser mixed-content blocks
	public_url = tunnel.public_url

	print("\n============================================")
	print(" SAGE DASHBOARD ")
	print("==============================================")
	print(f"URL: {public_url}")
	print(f"PWD: {env['SAGE_WEB_PASSWORD']}")
	print("==============================================\n")

	if USE_GPU_SERVER:
	print("Generate :", f"{public_url}/generate")
	else:
	print("Wait: Generate is not available on CPU server in this repo")
	print("Switch Colab to a GPU runtime if you want /generate.")
	except Exception as e:
	print("Could not start Ngrok: ", e)


	# Optional /generate smoke test
	if USE_GPU_SERVER and RUN_GENERATE_SMOKE:
	print("\nRunning /generate smoke test...")
	try:
	resp = requests.post(
	f"http://127.0.0.1:{PORT}/generate",
	json={"input_ids": [1, 42, 99], "max_new_tokens": 4},
	timeout=300,
	)
	print("Generate response:", resp.json())
	except Exception as e:
	print("Generate timeout or failure:", e)


	print(f"\nServer log path: {log_path}")
	print("The server will continuously run until you stop the Code Cell manually.")

	```
	---

	### Pro-Tips for the IDE

	- Command Palette: Press `Ctrl + K` anywhere to quickly jump between tools.
	- Function Inspector: You can click the Book 📖 icon on the right to browse the actual Python codebase from within the browser while your model trains.
	- Stop a stray training job: Go to the Jobs panel (the clipboard icon) and click the red "Stop" button on any running task to free up your GPU.