File size: 3,683 Bytes
5d61448 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | # TD Quick Start β Rent a GPU and Go
## What You Need (One-Time Setup)
1. **vast.ai account** β sign up at vast.ai, add credit ($10-20 to start)
2. **HuggingFace account** β sign up at huggingface.co (use any username, doesn't have to be your real name)
3. **HuggingFace token** β Settings β Access Tokens β New Token β **Write** access
4. **ntfy.sh app** on your phone (you already have this)
## One-Time: Upload Your Code to Private HuggingFace
Do this once from your computer. After this, your code lives in a private repo that only you can see.
```bash
# Install the tool
pip install huggingface_hub
# Log in (paste your token when asked)
huggingface-cli login
# Upload everything
HF_USER=your_hf_username bash upload_to_hf.sh
```
Now your td_lang, td_fuse, .td files, and deploy script are all in a private HuggingFace repo. Nobody can see them except you.
**When you update your code**, just run `upload_to_hf.sh` again β it overwrites with the latest version.
## Every Time: Rent GPU β 3 Commands β Done
### 1. Rent a GPU on vast.ai
Go to vast.ai β Console β Search for:
- **GPU:** RTX 4090 (24GB) or A100 (40GB+)
- **Image:** Pick one with PyTorch pre-installed (like `pytorch/pytorch`)
- **Storage:** At least 100GB disk
- **Cost:** ~$0.40-0.80/hr for a 4090
Click **RENT** and wait for it to start (~1-2 minutes).
### 2. Connect to the GPU
vast.ai gives you an SSH command. Copy and paste it into your terminal:
```
ssh -p 12345 root@ssh1.vast.ai
```
### 3. Run these 3 commands
```bash
# Set your token
export HF_TOKEN=hf_your_token_here
# Download your code from HuggingFace (takes ~10 seconds)
pip install huggingface_hub -q && python -c "
from huggingface_hub import snapshot_download
snapshot_download('YOUR_USERNAME/td-toolkit', local_dir='/workspace/td')
"
# Go!
cd /workspace/td && bash deploy.sh demo_autopilot.td
```
That's it. Put your phone down. ntfy.sh sends you updates as it runs.
### 4. When it's done
Your model gets saved to Google Drive automatically (if rclone is configured in the .td file). Otherwise it stays on the GPU at `final_model/`.
## Setting Up Google Drive (Optional, One-Time per GPU)
On the GPU machine after SSHing in:
```bash
rclone config
```
1. Type `n` for new remote
2. Name it `gdrive`
3. Pick `Google Drive` from the list
4. Follow the prompts (it gives you a URL to visit in your browser)
5. Done β now `save base to "gdrive:TD/models/final"` works in your .td files
**Tip:** You can save the rclone config to your HuggingFace repo too, so you don't have to set it up every time.
## Quick Reference
| Command | What it does |
|---------|-------------|
| `bash deploy.sh my_file.td` | Full setup + run |
| `python -m td_lang check my_file.td` | Check syntax only |
| `python -m td_lang info my_file.td` | Show plan without running |
| `python -m td_lang run my_file.td` | Run (skip deploy setup) |
| `python -m td_lang run my_file.td --dry` | Compile but don't execute |
## If Something Goes Wrong
- **OOM (out of memory):** Your .td file's `on_error` block handles this β it retries with smaller batches
- **Model download fails:** Check your HF_TOKEN is set correctly
- **ntfy not working:** Check your phone has the ntfy app and you're subscribed to the right topic
- **GPU disconnects:** Re-SSH in, your files are still there. Run deploy.sh again β td_lang picks up from the last snapshot
## Cost Estimate
For the full `demo_autopilot.td` pipeline (merge 4 models + 5 training loops):
- **RTX 4090:** ~$0.50/hr Γ ~30-40 hrs = ~$15-20
- **A100 40GB:** ~$1.00/hr Γ ~20-30 hrs = ~$20-30
- **Budget cap in .td file:** Set `max_cost = 160.00` to prevent runaway costs
|