# πŸš€ CommitGuard β€” Comprehensive GCP Deployment & Training Guide (A10G) This document is a deep-dive, step-by-step manual for deploying the CommitGuard environment and training pipeline to a Google Cloud Platform (GCP) instance. We are targeting an **NVIDIA A10G GPU** to execute **GRPO (Group Relative Policy Optimization)** on the Llama-3.2-3B model. --- ## πŸ“‹ 1. Prerequisites: Setting Up Your Toolbox Before you touch the cloud, you must ensure your local environment and external accounts are configured. These are the building blocks of the entire run. ### A. GCP Account & Project Setup * **Active Project:** You must have a GCP project created. Note your `PROJECT_ID`. * **GPU Quota:** By default, GCP projects have 0 quota for GPUs. You must navigate to `IAM & Admin > Quotas` and request a limit increase for `NVIDIA_A10G_GPUS` in your desired region (e.g., `us-central1`). **Do this 24 hours in advance.** ### B. Weights & Biases (WandB) for Visualization * **Why?** RL training can be unstable. WandB allows you to monitor the "Reward" and "KL Divergence" curves in real-time from your browser. * **Action:** Create a free account at [wandb.ai](https://wandb.ai), navigate to your settings, and copy your **API Key**. ### C. Hugging Face Account & Llama Access * **Model Gating:** Llama-3.2-3B is a gated model. You must visit the [model page](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and apply for access. Approval usually takes 30-60 minutes. * **Access Token:** Generate a "Write" token in your Hugging Face settings to allow the VM to download the model and upload your finished adapters. ### D. Local gcloud CLI Initialization * **Installation:** Install the Google Cloud SDK on your laptop. * **Authentication:** Run `gcloud auth login` and `gcloud config set project [YOUR_PROJECT_ID]`. This allows your local terminal to "talk" to GCP. --- ## πŸ›  Step 1: Provisioning the High-Performance VM We are using the **G2 Standard 4** machine. It is specifically designed for AI workloads. ### Detailed Breakdown of the Creation Command * **`--machine-type g2-standard-4`:** Provides 4 vCPUs and 16GB of system RAM, ensuring the CPU doesn't bottleneck the GPU. * **`--accelerator type=nvidia-a10g,count=1`:** Attaches the A10G GPU. Its 24GB of VRAM is the "Goldilocks" zone for 3B parameter modelsβ€”enough to handle the model plus the multiple "generations" required by the GRPO algorithm. * **`--image-family common-cu121`:** Uses a specialized Google image that comes with **CUDA 12.1 and NVIDIA drivers pre-installed**. This saves you 30 minutes of manual driver installation. * **`--provisioning-model=SPOT`:** **CRITICAL FOR BUDGET.** Spot instances use excess capacity and are ~70% cheaper than standard instances. If the instance is reclaimed by Google, your 50-step checkpoints ensure you don't lose much progress. ```bash gcloud compute instances create commitguard-trainer \ --project=[PROJECT_ID] \ --zone=us-central1-a \ --machine-type=g2-standard-4 \ --accelerator=count=1,type=nvidia-a10g \ --image-project=ml-images \ --image-family=common-cu121 \ --boot-disk-size=100GB \ --boot-disk-type=pd-balanced \ --maintenance-policy=TERMINATE \ --provisioning-model=SPOT ``` --- ## πŸ— Step 2: Environment Preparation Once the VM is "Running," we need to turn it into a specialized CommitGuard lab. ### A. Secure Connection (SSH) Connect to the machine's terminal: ```bash gcloud compute ssh commitguard-trainer --zone=us-central1-a ``` ### B. Repository & Virtual Environment We isolate our dependencies to prevent conflicts with system-level Python packages. ```bash # Clone the project git clone https://github.com/[YOUR_USER]/commitguard.git cd commitguard # Create a 'venv' (Virtual Environment) python3 -m venv .venv source .venv/bin/activate # Authenticate with Hugging Face (Required for gated Llama models) huggingface-cli login ``` ### C. Installing the "Train" Stack The `-e ".[train]"` command installs the `commitguard` package in "editable" mode along with all optional training libraries like `torch`, `peft`, and `trl`. ```bash pip install -U pip pip install -e ".[train]" # Flash Attention 2: This is a specialized kernel that makes Llama training # significantly faster and more memory-efficient on A10G hardware. pip install flash-attn --no-build-isolation ``` --- ## πŸ“‘ Step 3: Launching the Verifiable Reward Server CommitGuard uses **RLVR**. In this setup, the model doesn't just "guess" if it's right; it submits an action to a server that calculates a reward based on hard evidence. ### Running in the Background Since training takes hours, we run the server in the background using the `&` symbol. ```bash # Start the server python -m commitguard_env.server & # Verify Health: This ensures the database and API are ready. # If this fails, the trainer will hang indefinitely. curl http://localhost:8000/health # You should see: {"status":"healthy"} ``` --- ## 🧠 Step 4: Executing the GRPO Training Run GRPO is a "reinforcement learning" algorithm. It asks the model to generate 4 different answers for the same code diff, compares them to each other, and rewards the ones that follow the XML format and correctly identify the vulnerability. ### Hyperparameter Explanation * **`--steps 500`:** The model will see roughly 2,000 examples (4 generations x 500 steps). * **`4-bit Quantization`:** Automatically handled by the script. It "compresses" the model weights so they fit into the GPU's memory without losing accuracy. * **`LoRA r=8`:** "Low-Rank Adaptation." Instead of training 3 billion parameters, we only train about 5 million. This makes training stable and fast. * **`--live`:** Tells the script to fetch rewards from the server we started in Step 3. ```bash # Login to WandB so your graphs show up online export WANDB_API_KEY=[YOUR_WANDB_KEY] python scripts/train_grpo.py \ --model_name "meta-llama/Llama-3.2-3B-Instruct" \ --output_dir "./outputs/commitguard-final" \ --steps 500 \ --live \ --wandb "commitguard-rlvr" ``` --- ## πŸ’Ύ Step 5: Post-Run Weight Management & Cleanup Once the 500 steps are complete, the "brain" of your agent exists as a LoRA adapter in the `./outputs` folder. ### A. Permanent Storage (Hugging Face) The VM's disk is temporary. Move your weights to Hugging Face immediately. ```bash huggingface-cli login --token [YOUR_HF_TOKEN] huggingface-cli upload [HF_USERNAME]/commitguard-llama3b-adapter ./outputs/commitguard-final ``` ### B. Cost Control: Deleting the VM **DO NOT FORGET THIS STEP.** An idle A10G instance costs money every hour. ```bash # Exit the VM exit # Delete from your local terminal gcloud compute instances delete commitguard-trainer --zone=us-central1-a ``` --- ## πŸ†˜ Critical Troubleshooting ### "CUDA Out of Memory" * **Symptom:** Training crashes with a long error ending in `OutOfMemoryError`. * **Fix:** The "Group" in GRPO is currently set to 4 generations. Open `scripts/train_grpo.py` and change `num_generations=4` to `num_generations=2`. This cuts memory usage in half. ### "Connection Refused" * **Symptom:** Reward function returns -1.0 for everything or throws errors. * **Fix:** Your environment server crashed or wasn't started. Run `ps aux | grep server` to check if it is still running. ### The "Midnight Fallback" If the 3B model is too slow for the submission deadline: * Switch to the **1.5B Qwen** model. It uses the same XML format but is 2x faster. * Command: `python scripts/train_grpo.py --model_name "Qwen/Qwen2.5-1.5B-Instruct" ...` --- ## βœ… Final Success Checklist 1. [ ] **Health Check:** `curl` returns healthy. 2. [ ] **WandB Tracking:** You can see the `reward` curve moving on the website. 3. [ ] **Checkpoints:** You see folders like `checkpoint-50`, `checkpoint-100` in the output directory. 4. [ ] **Clean Exit:** The VM is deleted after the adapter is uploaded to Hugging Face.