key_word_Fast_API / GPU_SETUP_GUIDE.md
ihtesham0345's picture
Add GPU Setup Cheat Sheet for RTX 4050
30aa031
|
Raw
History Blame Contribute Delete
2.79 kB

πŸš€ Local GPU Setup Guide (RTX 4050 Edition)

This cheat sheet guides you through running the SEO Analyzer on your NVIDIA RTX 4050 for lightning-fast inference.

βœ… Prerequisites

  1. NVIDIA Drivers: Ensure your GeForce Experience drivers are up to date.
  2. Python 3.10 or 3.11: Installed and added to PATH.
  3. Git: To clone the repository.

πŸ› οΈ Step 1: Clone & Setup

Open your terminal (PowerShell or CMD) and run:

# 1. Clone the repository (if you haven't already)
git clone https://huggingface.co/spaces/ihtesham0345/key_word_Fast_API
cd key_word_Fast_API

# 2. Create a virtual environment (Recommended)
python -m venv venv
.\venv\Scripts\activate

⚑ Step 2: Install GPU-Enabled PyTorch (Crucial!)

By default, pip install torch might install the CPU version. We need the CUDA version.

# Uninstall any existing CPU version
pip uninstall torch torchvision torchaudio -y

# Install PyTorch with CUDA 12.1 support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Verify installation:

python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}'); print(f'Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}')"

It should say CUDA Available: True and Device: NVIDIA GeForce RTX 4050 Laptop GPU.


πŸ“¦ Step 3: Install Other Dependencies

Now install the rest of the app requirements.

pip install -r requirements.txt

πŸš€ Step 4: Run the Server

Launch the API. It will automatically detect your GPU.

python -m uvicorn main:app --reload

🧠 Model Recommendations for RTX 4050 (6GB)

Your card fits small to medium models perfectly.

Option A: Ultra Speed (Current)

  • Model: Qwen/Qwen2.5-0.5B-Instruct
  • Speed: Instant
  • VRAM: ~1 GB

Option B: The "Goldilocks" (Recommended)

Upgrade to the 1.5B model for smarter results.

  1. Open services/analyzer.py
  2. Change line 14:
    MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct" 
    
  3. Save and the server will auto-download it (3GB).

Option C: Max Intelligence (Quantized)

Run the 7B model using 4-bit quantization (Smarter than GPT-3.5).

  1. Install bitsandbytes: pip install bitsandbytes
  2. Update services/analyzer.py:
    MODEL_ID = "Qwen/Qwen2.5-7B-Instruct"
    # Update pipeline config
    pipe = pipeline(
        ...,
        model_kwargs={"load_in_4bit": True} 
    )
    

❓ Troubleshooting

  • Out of Memory (OOM): If you get a CUDA OOM error, close other apps (Chrome uses GPU!) or switch to a smaller model.
  • Slow Speed: Ensure your laptop is plugged in and in "Performance Mode".