# TD Quick Start — Rent a GPU and Go ## What You Need (One-Time Setup) 1. **vast.ai account** — sign up at vast.ai, add credit ($10-20 to start) 2. **HuggingFace account** — sign up at huggingface.co (use any username, doesn't have to be your real name) 3. **HuggingFace token** — Settings → Access Tokens → New Token → **Write** access 4. **ntfy.sh app** on your phone (you already have this) ## One-Time: Upload Your Code to Private HuggingFace Do this once from your computer. After this, your code lives in a private repo that only you can see. ```bash # Install the tool pip install huggingface_hub # Log in (paste your token when asked) huggingface-cli login # Upload everything HF_USER=your_hf_username bash upload_to_hf.sh ``` Now your td_lang, td_fuse, .td files, and deploy script are all in a private HuggingFace repo. Nobody can see them except you. **When you update your code**, just run `upload_to_hf.sh` again — it overwrites with the latest version. ## Every Time: Rent GPU → 3 Commands → Done ### 1. Rent a GPU on vast.ai Go to vast.ai → Console → Search for: - **GPU:** RTX 4090 (24GB) or A100 (40GB+) - **Image:** Pick one with PyTorch pre-installed (like `pytorch/pytorch`) - **Storage:** At least 100GB disk - **Cost:** ~$0.40-0.80/hr for a 4090 Click **RENT** and wait for it to start (~1-2 minutes). ### 2. Connect to the GPU vast.ai gives you an SSH command. Copy and paste it into your terminal: ``` ssh -p 12345 root@ssh1.vast.ai ``` ### 3. Run these 3 commands ```bash # Set your token export HF_TOKEN=hf_your_token_here # Download your code from HuggingFace (takes ~10 seconds) pip install huggingface_hub -q && python -c " from huggingface_hub import snapshot_download snapshot_download('YOUR_USERNAME/td-toolkit', local_dir='/workspace/td') " # Go! cd /workspace/td && bash deploy.sh demo_autopilot.td ``` That's it. Put your phone down. ntfy.sh sends you updates as it runs. ### 4. When it's done Your model gets saved to Google Drive automatically (if rclone is configured in the .td file). Otherwise it stays on the GPU at `final_model/`. ## Setting Up Google Drive (Optional, One-Time per GPU) On the GPU machine after SSHing in: ```bash rclone config ``` 1. Type `n` for new remote 2. Name it `gdrive` 3. Pick `Google Drive` from the list 4. Follow the prompts (it gives you a URL to visit in your browser) 5. Done — now `save base to "gdrive:TD/models/final"` works in your .td files **Tip:** You can save the rclone config to your HuggingFace repo too, so you don't have to set it up every time. ## Quick Reference | Command | What it does | |---------|-------------| | `bash deploy.sh my_file.td` | Full setup + run | | `python -m td_lang check my_file.td` | Check syntax only | | `python -m td_lang info my_file.td` | Show plan without running | | `python -m td_lang run my_file.td` | Run (skip deploy setup) | | `python -m td_lang run my_file.td --dry` | Compile but don't execute | ## If Something Goes Wrong - **OOM (out of memory):** Your .td file's `on_error` block handles this — it retries with smaller batches - **Model download fails:** Check your HF_TOKEN is set correctly - **ntfy not working:** Check your phone has the ntfy app and you're subscribed to the right topic - **GPU disconnects:** Re-SSH in, your files are still there. Run deploy.sh again — td_lang picks up from the last snapshot ## Cost Estimate For the full `demo_autopilot.td` pipeline (merge 4 models + 5 training loops): - **RTX 4090:** ~$0.50/hr × ~30-40 hrs = ~$15-20 - **A100 40GB:** ~$1.00/hr × ~20-30 hrs = ~$20-30 - **Budget cap in .td file:** Set `max_cost = 160.00` to prevent runaway costs