TD Quick Start β Rent a GPU and Go
What You Need (One-Time Setup)
- vast.ai account β sign up at vast.ai, add credit ($10-20 to start)
- HuggingFace account β sign up at huggingface.co (use any username, doesn't have to be your real name)
- HuggingFace token β Settings β Access Tokens β New Token β Write access
- ntfy.sh app on your phone (you already have this)
One-Time: Upload Your Code to Private HuggingFace
Do this once from your computer. After this, your code lives in a private repo that only you can see.
# Install the tool
pip install huggingface_hub
# Log in (paste your token when asked)
huggingface-cli login
# Upload everything
HF_USER=your_hf_username bash upload_to_hf.sh
Now your td_lang, td_fuse, .td files, and deploy script are all in a private HuggingFace repo. Nobody can see them except you.
When you update your code, just run upload_to_hf.sh again β it overwrites with the latest version.
Every Time: Rent GPU β 3 Commands β Done
1. Rent a GPU on vast.ai
Go to vast.ai β Console β Search for:
- GPU: RTX 4090 (24GB) or A100 (40GB+)
- Image: Pick one with PyTorch pre-installed (like
pytorch/pytorch) - Storage: At least 100GB disk
- Cost: ~$0.40-0.80/hr for a 4090
Click RENT and wait for it to start (~1-2 minutes).
2. Connect to the GPU
vast.ai gives you an SSH command. Copy and paste it into your terminal:
ssh -p 12345 root@ssh1.vast.ai
3. Run these 3 commands
# Set your token
export HF_TOKEN=hf_your_token_here
# Download your code from HuggingFace (takes ~10 seconds)
pip install huggingface_hub -q && python -c "
from huggingface_hub import snapshot_download
snapshot_download('YOUR_USERNAME/td-toolkit', local_dir='/workspace/td')
"
# Go!
cd /workspace/td && bash deploy.sh demo_autopilot.td
That's it. Put your phone down. ntfy.sh sends you updates as it runs.
4. When it's done
Your model gets saved to Google Drive automatically (if rclone is configured in the .td file). Otherwise it stays on the GPU at final_model/.
Setting Up Google Drive (Optional, One-Time per GPU)
On the GPU machine after SSHing in:
rclone config
- Type
nfor new remote - Name it
gdrive - Pick
Google Drivefrom the list - Follow the prompts (it gives you a URL to visit in your browser)
- Done β now
save base to "gdrive:TD/models/final"works in your .td files
Tip: You can save the rclone config to your HuggingFace repo too, so you don't have to set it up every time.
Quick Reference
| Command | What it does |
|---|---|
bash deploy.sh my_file.td |
Full setup + run |
python -m td_lang check my_file.td |
Check syntax only |
python -m td_lang info my_file.td |
Show plan without running |
python -m td_lang run my_file.td |
Run (skip deploy setup) |
python -m td_lang run my_file.td --dry |
Compile but don't execute |
If Something Goes Wrong
- OOM (out of memory): Your .td file's
on_errorblock handles this β it retries with smaller batches - Model download fails: Check your HF_TOKEN is set correctly
- ntfy not working: Check your phone has the ntfy app and you're subscribed to the right topic
- GPU disconnects: Re-SSH in, your files are still there. Run deploy.sh again β td_lang picks up from the last snapshot
Cost Estimate
For the full demo_autopilot.td pipeline (merge 4 models + 5 training loops):
- RTX 4090: ~$0.50/hr Γ ~30-40 hrs = ~$15-20
- A100 40GB: ~$1.00/hr Γ ~20-30 hrs = ~$20-30
- Budget cap in .td file: Set
max_cost = 160.00to prevent runaway costs