File size: 9,123 Bytes
ea28aeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d2114e
ea28aeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d2114e
 
773566b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ea28aeb
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# SAGE UI — Colab Quickstart Guide

So you started the SAGE server in Google Colab, got your Ngrok link, and loaded the webpage. Welcome to the **SAGE Browser IDE**!

Right now, you are looking at the "Control Plane." The AI is essentially a blank slate. To get a "proper agent" that can chat with you, you need to use this interface to prepare data and train the model step-by-step.

Here is exactly what to do.

---

### Step 1: Open the Terminal & Download Data

To train a model, you need text. We recently added a 5-Billion-Token downloader, but we don't need all 5 billion for a quick test.

1. In the SAGE IDE, open the **CLI Terminal** (click the `>_` icon on the left sidebar, or press `Ctrl + \``).
2. Type the following command and press Enter to download a small 1% slice (~50 Million tokens):
   ```bash
   python debug/download_5b_tokens.py --output-dir data/raw --scale 0.01
   ```
3. Watch the terminal. It will take a few minutes to download the General Web, Code, Math, Wikipedia, and Synthetic datasets into `data/raw/`.

### Step 2: Train Your Tokenizer

The AI doesn't read English words; it reads "Tokens". It needs to learn its vocabulary from your downloaded data.

1. Click the **Presets** tab (the rocket icon 🚀) on the left sidebar.
2. Select **Tokenizer Train** from the dropdown menu.
3. Click the purple **Run Job** button.
4. A new panel will slide out showing you the live logs. Wait until it says `Job finished successfully`.

### Step 3: Fast-Pack Your Data (Sharding)

Training directly from text files is too slow for a GPU. We need to tokenize the text and pack it into high-speed Parquet "shards".

1. Go back to the **Presets** tab.
2. Select **Build Data Shards**.
3. Set the `shard_size` to `2048`.
4. Click **Run Job**.
5. Wait for the logs to finish. When done, your data is packed and ready!

### Step 4: Begin Training the AI

Now it's time to put the GPU to work.

1. Go back to the **Presets** tab.
2. Select **Training Run**.
3. You can leave the steps at the default (e.g., `20` for a smoke test, or change it to `2000` for a real micro-run). Make sure `disable_wandb` is checked so it doesn't ask for a Weights & Biases login.
4. Click **Run Job**.
5. The live log viewer will now stream training metrics. You will see `loss` going down and `tokens_per_second` showing how fast your Colab T4 GPU is churning through data.
6. The trainer automatically saves checkpoints (e.g., `ckpt_step_1000.pt`) into the `runs/` folder.

### Step 5: Chat with Your New Agent

Once the training has run for a decent amount of steps and a checkpoint is saved, the model is ready to talk!

1. Click the **Chat** tab (the speech bubble icon 💬) on the left sidebar.
2. Type a message like _"What is Python?"_ and hit Enter.
3. The UI will send this prompt to the backend, run inference using your newly trained checkpoint and tokenizer, and stream the generated response back to your screen.

_(Note: If you only trained for 20 steps, the AI will probably respond with random gibberish. Real reasoning requires thousands of steps over billions of tokens!)_

---
# Connect your models with ngrok for public IP 
~ Do not forgot to set up and ngrok authentication token with this!


```bash
# Colab one-cell launcher for the real SAGE server
# Before running:
# 1. In Colab, open the Secrets panel (Key icon on the left) and add your NGROK_AUTHTOKEN
# 2. If you want /generate, switch Colab to a T4 GPU runtime

import os
import sys
import time
import atexit
import subprocess
import importlib
import secrets
from pathlib import Path

REPO_URL = "https://huggingface.co/sage002/sage"
REPO_DIR = Path("/content/sage")
PORT = 8000
RUN_GENERATE_SMOKE = False

def run(cmd, cwd=None):
    print("+", " ".join(cmd))
    subprocess.run(cmd, cwd=cwd, check=True)

# 1. Clone or update repo
if not REPO_DIR.exists():
    run(["git", "clone", REPO_URL, str(REPO_DIR)])
else:
    run(["git", "-C", str(REPO_DIR), "pull", "--ff-only"])

# 2. Install dependencies
run([sys.executable, "-m", "pip", "install", "-q", "-U", "pip"])
run([
    sys.executable, "-m", "pip", "install", "-q",
    "fastapi>=0.110.0", "uvicorn>=0.29.0", "python-multipart>=0.0.9",
    "pydantic>=2.7.0", "pyyaml>=6.0.1", "psutil>=5.9.8",
    "pyngrok>=7.2.0", "requests>=2.31.0"
])

try:
    import torch
except ImportError:
    run([sys.executable, "-m", "pip", "install", "-q", "torch>=2.1.0"])
    import torch

# Refresh path caches so the cell can instantly import newly installed modules
importlib.invalidate_caches()
import requests
from pyngrok import ngrok

# 3. Retrieve Ngrok token securely via Colab Secrets (or fallback to environment variable)
try:
    from google.colab import userdata
    NGROK_AUTHTOKEN = userdata.get("NGROK_AUTHTOKEN")
except Exception:
    NGROK_AUTHTOKEN = os.environ.get("NGROK_AUTHTOKEN")

if not NGROK_AUTHTOKEN:
    raise ValueError("Missing NGROK_AUTHTOKEN. Please add it to your Colab Secrets.")

# 4. Supply necessary SAGE environment variables for the server
env = os.environ.copy()
env["SAGE_WEB_PASSWORD"] = env.get("SAGE_WEB_PASSWORD") or secrets.token_urlsafe(12)
env["SAGE_MODEL_CONFIG"] = env.get("SAGE_MODEL_CONFIG", "configs/model/1b.yaml")
env["SAGE_CHECKPOINT_DIR"] = env.get("SAGE_CHECKPOINT_DIR", "runs/sage-1b")
env["SAGE_TOKENIZER_MODEL"] = env.get("SAGE_TOKENIZER_MODEL", "tokenizer/tokenizer.model")

USE_GPU_SERVER = torch.cuda.is_available()
APP_TARGET = "serve.server:app" if USE_GPU_SERVER else "serve.server_cpu:app"

print(f"GPU available: {USE_GPU_SERVER}")
print(f"Starting app target: {APP_TARGET}")
print(f"SAGE_WEB_PASSWORD: {env['SAGE_WEB_PASSWORD']}  <-- Use this to login to the IDE")

# 5. Start Uvicorn Server attached to the log file via Popen
log_path = REPO_DIR / "uvicorn.log"
log_file = open(log_path, "w", encoding="utf-8")

server_proc = subprocess.Popen(
    [
        sys.executable, "-m", "uvicorn",
        APP_TARGET,
        "--host", "0.0.0.0",
        "--port", str(PORT),
    ],
    cwd=str(REPO_DIR),
    env=env,                    # Required: Passes the SAGE environment variables to Uvicorn
    stdout=log_file,
    stderr=subprocess.STDOUT,
)

def cleanup():
    global server_proc, log_file
    print("Cleaning up...")
    try:
        ngrok.disconnect(public_url)
        ngrok.kill()
    except Exception:
        pass
    if server_proc and server_proc.poll() is None:
        server_proc.terminate()
        try:
            server_proc.wait(timeout=10)
        except subprocess.TimeoutExpired:
            server_proc.kill()
    try:
        log_file.close()
    except Exception:
        pass
    print("Cleanup complete.")

atexit.register(cleanup)

# 6. Wait for health check success
health_url = f"http://127.0.0.1:{PORT}/health"
for _ in range(60):
    if server_proc.poll() is not None:
        log_file.flush()
        raise RuntimeError("Uvicorn exited early.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))
    try:
        r = requests.get(health_url, timeout=2)
        if r.ok:
            print("Local health OK:", r.json())
            break
    except Exception:
        pass
    time.sleep(2)
else:
    log_file.flush()
    raise TimeoutError("Server did not become healthy.\n\n" + log_path.read_text(encoding="utf-8", errors="ignore"))

# 7. Start Ngrok HTTPs Tunnel
try:
    ngrok.kill()
    ngrok.set_auth_token(NGROK_AUTHTOKEN)
    tunnel = ngrok.connect(addr=PORT, proto="http", bind_tls=True) # Forces HTTPS UI which stops browser mixed-content blocks
    public_url = tunnel.public_url

    print("\n============================================")
    print("        SAGE DASHBOARD        ")
    print("==============================================")
    print(f"URL: {public_url}")
    print(f"PWD: {env['SAGE_WEB_PASSWORD']}")
    print("==============================================\n")

    if USE_GPU_SERVER:
        print("Generate          :", f"{public_url}/generate")
    else:
        print("Wait: Generate is not available on CPU server in this repo")
        print("Switch Colab to a GPU runtime if you want /generate.")
except Exception as e:
    print("Could not start Ngrok: ", e)


# Optional /generate smoke test
if USE_GPU_SERVER and RUN_GENERATE_SMOKE:
    print("\nRunning /generate smoke test...")
    try:
        resp = requests.post(
            f"http://127.0.0.1:{PORT}/generate",
            json={"input_ids": [1, 42, 99], "max_new_tokens": 4},
            timeout=300,
        )
        print("Generate response:", resp.json())
    except Exception as e:
        print("Generate timeout or failure:", e)


print(f"\nServer log path: {log_path}")
print("The server will continuously run until you stop the Code Cell manually.")

```
---

### Pro-Tips for the IDE

- **Command Palette:** Press `Ctrl + K` anywhere to quickly jump between tools.
- **Function Inspector:** You can click the Book 📖 icon on the right to browse the actual Python codebase from within the browser while your model trains.
- **Stop a stray training job:** Go to the **Jobs** panel (the clipboard icon) and click the red "Stop" button on any running task to free up your GPU.