# 🚀 Local GPU Setup Guide (RTX 4050 Edition) This cheat sheet guides you through running the SEO Analyzer on your **NVIDIA RTX 4050** for lightning-fast inference. ## ✅ Prerequisites 1. **NVIDIA Drivers**: Ensure your GeForce Experience drivers are up to date. 2. **Python 3.10 or 3.11**: Installed and added to PATH. 3. **Git**: To clone the repository. --- ## 🛠️ Step 1: Clone & Setup Open your terminal (PowerShell or CMD) and run: ```powershell # 1. Clone the repository (if you haven't already) git clone https://huggingface.co/spaces/ihtesham0345/key_word_Fast_API cd key_word_Fast_API # 2. Create a virtual environment (Recommended) python -m venv venv .\venv\Scripts\activate ``` --- ## ⚡ Step 2: Install GPU-Enabled PyTorch (Crucial!) By default, `pip install torch` might install the CPU version. We need the CUDA version. ```powershell # Uninstall any existing CPU version pip uninstall torch torchvision torchaudio -y # Install PyTorch with CUDA 12.1 support pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ``` *Verify installation:* ```powershell python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}'); print(f'Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}')" ``` *It should say `CUDA Available: True` and `Device: NVIDIA GeForce RTX 4050 Laptop GPU`.* --- ## 📦 Step 3: Install Other Dependencies Now install the rest of the app requirements. ```powershell pip install -r requirements.txt ``` --- ## 🚀 Step 4: Run the Server Launch the API. It will automatically detect your GPU. ```powershell python -m uvicorn main:app --reload ``` --- ## 🧠 Model Recommendations for RTX 4050 (6GB) Your card fits small to medium models perfectly. ### Option A: Ultra Speed (Current) * **Model**: `Qwen/Qwen2.5-0.5B-Instruct` * **Speed**: Instant * **VRAM**: ~1 GB ### Option B: The "Goldilocks" (Recommended) Upgrade to the 1.5B model for smarter results. 1. Open `services/analyzer.py` 2. Change line 14: ```python MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct" ``` 3. Save and the server will auto-download it (3GB). ### Option C: Max Intelligence (Quantized) Run the 7B model using 4-bit quantization (Smarter than GPT-3.5). 1. Install bitsandbytes: `pip install bitsandbytes` 2. Update `services/analyzer.py`: ```python MODEL_ID = "Qwen/Qwen2.5-7B-Instruct" # Update pipeline config pipe = pipeline( ..., model_kwargs={"load_in_4bit": True} ) ``` --- ## ❓ Troubleshooting - **Out of Memory (OOM)**: If you get a CUDA OOM error, close other apps (Chrome uses GPU!) or switch to a smaller model. - **Slow Speed**: Ensure your laptop is plugged in and in "Performance Mode".