File size: 3,683 Bytes
5d61448
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# TD Quick Start β€” Rent a GPU and Go

## What You Need (One-Time Setup)

1. **vast.ai account** β€” sign up at vast.ai, add credit ($10-20 to start)
2. **HuggingFace account** β€” sign up at huggingface.co (use any username, doesn't have to be your real name)
3. **HuggingFace token** β€” Settings β†’ Access Tokens β†’ New Token β†’ **Write** access
4. **ntfy.sh app** on your phone (you already have this)

## One-Time: Upload Your Code to Private HuggingFace

Do this once from your computer. After this, your code lives in a private repo that only you can see.

```bash
# Install the tool
pip install huggingface_hub

# Log in (paste your token when asked)
huggingface-cli login

# Upload everything
HF_USER=your_hf_username bash upload_to_hf.sh
```

Now your td_lang, td_fuse, .td files, and deploy script are all in a private HuggingFace repo. Nobody can see them except you.

**When you update your code**, just run `upload_to_hf.sh` again β€” it overwrites with the latest version.

## Every Time: Rent GPU β†’ 3 Commands β†’ Done

### 1. Rent a GPU on vast.ai

Go to vast.ai β†’ Console β†’ Search for:
- **GPU:** RTX 4090 (24GB) or A100 (40GB+)
- **Image:** Pick one with PyTorch pre-installed (like `pytorch/pytorch`)
- **Storage:** At least 100GB disk
- **Cost:** ~$0.40-0.80/hr for a 4090

Click **RENT** and wait for it to start (~1-2 minutes).

### 2. Connect to the GPU

vast.ai gives you an SSH command. Copy and paste it into your terminal:
```
ssh -p 12345 root@ssh1.vast.ai
```

### 3. Run these 3 commands

```bash
# Set your token
export HF_TOKEN=hf_your_token_here

# Download your code from HuggingFace (takes ~10 seconds)
pip install huggingface_hub -q && python -c "
from huggingface_hub import snapshot_download
snapshot_download('YOUR_USERNAME/td-toolkit', local_dir='/workspace/td')
"

# Go!
cd /workspace/td && bash deploy.sh demo_autopilot.td
```

That's it. Put your phone down. ntfy.sh sends you updates as it runs.

### 4. When it's done

Your model gets saved to Google Drive automatically (if rclone is configured in the .td file). Otherwise it stays on the GPU at `final_model/`.

## Setting Up Google Drive (Optional, One-Time per GPU)

On the GPU machine after SSHing in:
```bash
rclone config
```
1. Type `n` for new remote
2. Name it `gdrive`
3. Pick `Google Drive` from the list
4. Follow the prompts (it gives you a URL to visit in your browser)
5. Done β€” now `save base to "gdrive:TD/models/final"` works in your .td files

**Tip:** You can save the rclone config to your HuggingFace repo too, so you don't have to set it up every time.

## Quick Reference

| Command | What it does |
|---------|-------------|
| `bash deploy.sh my_file.td` | Full setup + run |
| `python -m td_lang check my_file.td` | Check syntax only |
| `python -m td_lang info my_file.td` | Show plan without running |
| `python -m td_lang run my_file.td` | Run (skip deploy setup) |
| `python -m td_lang run my_file.td --dry` | Compile but don't execute |

## If Something Goes Wrong

- **OOM (out of memory):** Your .td file's `on_error` block handles this β€” it retries with smaller batches
- **Model download fails:** Check your HF_TOKEN is set correctly
- **ntfy not working:** Check your phone has the ntfy app and you're subscribed to the right topic
- **GPU disconnects:** Re-SSH in, your files are still there. Run deploy.sh again β€” td_lang picks up from the last snapshot

## Cost Estimate

For the full `demo_autopilot.td` pipeline (merge 4 models + 5 training loops):
- **RTX 4090:** ~$0.50/hr Γ— ~30-40 hrs = ~$15-20
- **A100 40GB:** ~$1.00/hr Γ— ~20-30 hrs = ~$20-30
- **Budget cap in .td file:** Set `max_cost = 160.00` to prevent runaway costs