# Cloud Deployment Guide for SmolLM3 DPO Training This guide provides the exact sequence of commands to deploy and run SmolLM3 DPO training on a cloud computing instance with 6 epochs. ## Prerequisites ### Cloud Instance Requirements - **GPU**: NVIDIA A100, H100, or similar (16GB+ VRAM) - **RAM**: 64GB+ system memory - **Storage**: 100GB+ SSD storage - **OS**: Ubuntu 20.04 or 22.04 ### Required Information Before starting, gather these details: - Your Hugging Face username - Your Hugging Face token (with write permissions) - Your Trackio Space URL (if using monitoring) ## Step-by-Step Deployment ### Step 1: Launch Cloud Instance Choose your cloud provider and launch an instance: #### AWS (g5.2xlarge or g5.4xlarge) ```bash # Launch instance with Ubuntu 22.04 and appropriate GPU aws ec2 run-instances \ --image-id ami-0c7217cdde317cfec \ --instance-type g5.2xlarge \ --key-name your-key-pair \ --security-group-ids sg-xxxxxxxxx ``` #### Google Cloud (n1-standard-8 with T4/V100) ```bash gcloud compute instances create smollm3-dpo \ --zone=us-central1-a \ --machine-type=n1-standard-8 \ --accelerator="type=nvidia-tesla-t4,count=1" \ --image-family=ubuntu-2204-lts \ --image-project=ubuntu-os-cloud ``` #### Azure (Standard_NC6s_v3) ```bash az vm create \ --resource-group your-rg \ --name smollm3-dpo \ --image Canonical:0001-com-ubuntu-server-jammy:22_04-lts:latest \ --size Standard_NC6s_v3 \ --admin-username azureuser ``` ### Step 2: Connect to Instance ```bash # SSH to your instance ssh -i your-key.pem ubuntu@your-instance-ip # Or for Azure ssh azureuser@your-instance-ip ``` ### Step 3: Update System and Install Dependencies ```bash # Update system sudo apt-get update sudo apt-get upgrade -y # Install system dependencies sudo apt-get install -y git curl wget unzip python3 python3-pip python3-venv # Install NVIDIA drivers (if not pre-installed) curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit ``` ### Step 4: Clone Repository and Setup Environment ```bash # Clone your repository git clone https://github.com/your-username/flexai-finetune.git cd flexai-finetune # Create virtual environment python3 -m venv smollm3_env source smollm3_env/bin/activate # Install PyTorch with CUDA pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Install project dependencies pip install -r requirements.txt # Install additional DPO dependencies pip install trl>=0.7.0 pip install peft>=0.4.0 pip install accelerate>=0.20.0 ``` ### Step 5: Configure Authentication ```bash # Set your Hugging Face token export HF_TOKEN="your_huggingface_token_here" # Login to Hugging Face hf login --token $HF_TOKEN ``` ### Step 6: Create Configuration Files Create the DPO configuration file: ```bash cat > config/train_smollm3_dpo_6epochs.py << 'EOF' """ SmolLM3 DPO Training Configuration - 6 Epochs Optimized for cloud deployment """ from config.train_smollm3_dpo import SmolLM3DPOConfig config = SmolLM3DPOConfig( # Model configuration model_name="HuggingFaceTB/SmolLM3-3B", max_seq_length=4096, use_flash_attention=True, use_gradient_checkpointing=True, # Training configuration batch_size=2, gradient_accumulation_steps=8, learning_rate=5e-6, weight_decay=0.01, warmup_steps=100, max_iters=None, # Will be calculated based on epochs eval_interval=100, log_interval=10, save_interval=500, # DPO configuration beta=0.1, max_prompt_length=2048, # Optimizer configuration optimizer="adamw", beta1=0.9, beta2=0.95, eps=1e-8, # Scheduler configuration scheduler="cosine", min_lr=1e-6, # Mixed precision fp16=True, bf16=False, # Logging and saving save_steps=500, eval_steps=100, logging_steps=10, save_total_limit=3, # Evaluation eval_strategy="steps", metric_for_best_model="eval_loss", greater_is_better=False, load_best_model_at_end=True, # Data configuration data_dir="smoltalk_dataset", train_file="train.json", validation_file="validation.json", # Chat template configuration use_chat_template=True, chat_template_kwargs={ "enable_thinking": False, "add_generation_prompt": True }, # Trackio monitoring configuration enable_tracking=True, trackio_url="https://your-trackio-space.hf.space", # Change this trackio_token=None, log_artifacts=True, log_metrics=True, log_config=True, experiment_name="smollm3_dpo_6epochs" ) EOF ``` ### Step 7: Download and Prepare Dataset ```bash # Create dataset preparation script cat > prepare_dataset.py << 'EOF' from datasets import load_dataset import json import os # Load SmolTalk dataset print('Loading SmolTalk dataset...') dataset = load_dataset('HuggingFaceTB/smoltalk') # Create dataset directory os.makedirs('smoltalk_dataset', exist_ok=True) # Convert to DPO format (preference pairs) def convert_to_dpo_format(example): # For SmolTalk, we'll create preference pairs based on response quality # This is a simplified example - you may need to adjust based on your needs return { 'prompt': example.get('prompt', ''), 'chosen': example.get('chosen', ''), 'rejected': example.get('rejected', '') } # Process train split train_data = [] for example in dataset['train']: dpo_example = convert_to_dpo_format(example) if dpo_example['prompt'] and dpo_example['chosen'] and dpo_example['rejected']: train_data.append(dpo_example) # Process validation split val_data = [] for example in dataset['validation']: dpo_example = convert_to_dpo_format(example) if dpo_example['prompt'] and dpo_example['chosen'] and dpo_example['rejected']: val_data.append(dpo_example) # Save to files with open('smoltalk_dataset/train.json', 'w') as f: json.dump(train_data, f, indent=2) with open('smoltalk_dataset/validation.json', 'w') as f: json.dump(val_data, f, indent=2) print(f'Dataset prepared: {len(train_data)} train samples, {len(val_data)} validation samples') EOF # Run dataset preparation python prepare_dataset.py ``` ### Step 8: Calculate Training Parameters ```bash # Calculate training steps based on epochs TOTAL_SAMPLES=$(python -c "import json; data=json.load(open('smoltalk_dataset/train.json')); print(len(data))") BATCH_SIZE=2 GRADIENT_ACCUMULATION_STEPS=8 MAX_EPOCHS=6 EFFECTIVE_BATCH_SIZE=$((BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS)) STEPS_PER_EPOCH=$((TOTAL_SAMPLES / EFFECTIVE_BATCH_SIZE)) MAX_STEPS=$((STEPS_PER_EPOCH * MAX_EPOCHS)) echo "Training Configuration:" echo " Total samples: $TOTAL_SAMPLES" echo " Effective batch size: $EFFECTIVE_BATCH_SIZE" echo " Steps per epoch: $STEPS_PER_EPOCH" echo " Total training steps: $MAX_STEPS" echo " Training epochs: $MAX_EPOCHS" ``` ### Step 9: Start DPO Training ```bash # Start training with all parameters python train.py config/train_smollm3_dpo_6epochs.py \ --dataset_dir smoltalk_dataset \ --out_dir /output-checkpoint \ --init_from scratch \ --max_iters $MAX_STEPS \ --batch_size $BATCH_SIZE \ --learning_rate 5e-6 \ --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ --max_seq_length 4096 \ --save_steps 500 \ --eval_steps 100 \ --logging_steps 10 \ --enable_tracking \ --trackio_url "https://your-trackio-space.hf.space" \ --experiment_name "smollm3_dpo_6epochs" ``` ### Step 10: Push Model to Hugging Face Hub ```bash # Push the trained model python push_to_huggingface.py /output-checkpoint "your-username/smollm3-dpo-6epochs" \ --token "$HF_TOKEN" \ --trackio-url "https://your-trackio-space.hf.space" \ --experiment-name "smollm3_dpo_6epochs" ``` ### Step 11: Test the Uploaded Model ```bash # Test the model python -c " from transformers import AutoModelForCausalLM, AutoTokenizer import torch print('Loading uploaded model...') model = AutoModelForCausalLM.from_pretrained('your-username/smollm3-dpo-6epochs', torch_dtype=torch.float16, device_map='auto') tokenizer = AutoTokenizer.from_pretrained('your-username/smollm3-dpo-6epochs') print('Testing model generation...') prompt = 'Hello, how are you?' inputs = tokenizer(prompt, return_tensors='pt').to(model.device) outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f'Prompt: {prompt}') print(f'Response: {response}') print('✅ Model test completed successfully!') " ``` ## Complete One-Line Deployment If you want to run everything automatically, use the deployment script: ```bash # Make script executable chmod +x cloud_deployment.sh # Edit configuration in the script first nano cloud_deployment.sh # Change these variables: # - REPO_NAME="your-username/smollm3-dpo-6epochs" # - TRACKIO_URL="https://your-trackio-space.hf.space" # - HF_TOKEN="your_hf_token_here" # Run the complete deployment ./cloud_deployment.sh ``` ## Monitoring and Debugging ### Check GPU Usage ```bash # Monitor GPU usage during training watch -n 1 nvidia-smi ``` ### Check Training Logs ```bash # Monitor training progress tail -f training.log # Check system resources htop ``` ### Monitor Trackio ```bash # Check if Trackio is logging properly curl -s "https://your-trackio-space.hf.space" | grep -i "experiment" ``` ## Expected Timeline - **Setup**: 15-30 minutes - **Dataset preparation**: 5-10 minutes - **Training (6 epochs)**: 4-8 hours (depending on GPU) - **Model upload**: 10-30 minutes - **Testing**: 5-10 minutes ## Troubleshooting ### Common Issues #### 1. Out of Memory (OOM) ```bash # Reduce batch size BATCH_SIZE=1 GRADIENT_ACCUMULATION_STEPS=16 # Or use gradient checkpointing # Already enabled in config ``` #### 2. Slow Training ```bash # Check GPU utilization nvidia-smi # Check if mixed precision is working # Look for "fp16" in training logs ``` #### 3. Dataset Issues ```bash # Check dataset format head -n 5 smoltalk_dataset/train.json # Verify dataset size wc -l smoltalk_dataset/train.json ``` #### 4. Authentication Issues ```bash # Test HF token python -c " from huggingface_hub import HfApi api = HfApi(token='$HF_TOKEN') print('Token is valid!') " ``` ## Cost Estimation ### AWS (g5.2xlarge) - **Instance**: $0.526/hour - **Training time**: 6 hours - **Total cost**: ~$3.16 ### Google Cloud (n1-standard-8 + T4) - **Instance**: $0.38/hour - **Training time**: 6 hours - **Total cost**: ~$2.28 ### Azure (Standard_NC6s_v3) - **Instance**: $0.90/hour - **Training time**: 6 hours - **Total cost**: ~$5.40 ## Next Steps After successful deployment: 1. **Monitor training** in your Trackio Space 2. **Check model repository** on Hugging Face Hub 3. **Test the model** with different prompts 4. **Share your model** with the community 5. **Iterate and improve** based on results ## Support - **Training issues**: Check logs and GPU utilization - **Upload issues**: Verify HF token and repository permissions - **Monitoring issues**: Check Trackio Space configuration - **Performance issues**: Adjust batch size and learning rate Your SmolLM3 DPO model will be ready for use after training completes!