| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <title>Deploy LLMs on NVIDIA RTX - Complete Guide</title> |
| <link rel="icon" type="image/x-icon" href="/static/favicon.ico"> |
| <script src="https://cdn.tailwindcss.com"></script> |
| <link href="https://unpkg.com/aos@2.3.1/dist/aos.css" rel="stylesheet"> |
| <script src="https://unpkg.com/aos@2.3.1/dist/aos.js"></script> |
| <script src="https://cdn.jsdelivr.net/npm/feather-icons/dist/feather.min.js"></script> |
| <script src="https://unpkg.com/feather-icons"></script> |
| <script> |
| tailwind.config = { |
| theme: { |
| extend: { |
| colors: { |
| 'nvidia-green': '#76b900', |
| 'ai-purple': '#8a2be2', |
| 'dark-bg': '#0f172a', |
| } |
| } |
| } |
| } |
| </script> |
| <style> |
| .gradient-bg { |
| background: linear-gradient(135deg, #0f172a 0%, #1e293b 50%, #334155 100%); |
| } |
| .gpu-card { |
| transition: transform 0.3s ease, box-shadow 0.3s ease; |
| } |
| .gpu-card:hover { |
| transform: translateY(-5px); |
| box-shadow: 0 20px 25px -5px rgba(118, 185, 0, 0.1), 0 10px 10px -5px rgba(118, 185, 0, 0.04); |
| } |
| .step-card { |
| border-left: 4px solid #76b900; |
| } |
| .challenge-card { |
| border-left: 4px solid #8a2be2; |
| } |
| .code-block { |
| font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace; |
| background-color: #1e293b; |
| border-radius: 0.5rem; |
| padding: 1rem; |
| overflow-x: auto; |
| } |
| </style> |
| </head> |
| <body class="gradient-bg text-gray-100 min-h-screen"> |
| |
| <nav class="bg-dark-bg/80 backdrop-blur-md sticky top-0 z-50 border-b border-gray-700"> |
| <div class="container mx-auto px-4 py-4"> |
| <div class="flex justify-between items-center"> |
| <div class="flex items-center space-x-2"> |
| <i data-feather="cpu" class="text-nvidia-green"></i> |
| <span class="text-xl font-bold">LLMDeploy</span> |
| </div> |
| <div class="hidden md:flex space-x-8"> |
| <a href="#requirements" class="hover:text-nvidia-green transition-colors">Requirements</a> |
| <a href="#guide" class="hover:text-nvidia-green transition-colors">Guide</a> |
| <a href="#optimization" class="hover:text-nvidia-green transition-colors">Optimization</a> |
| <a href="#troubleshooting" class="hover:text-nvidia-green transition-colors">Troubleshooting</a> |
| </div> |
| <button class="md:hidden"> |
| <i data-feather="menu"></i> |
| </button> |
| </div> |
| </div> |
| </nav> |
|
|
| |
| <section class="py-20 px-4"> |
| <div class="container mx-auto text-center" data-aos="fade-up"> |
| <div class="flex justify-center mb-6"> |
| <div class="bg-nvidia-green/20 p-4 rounded-full"> |
| <i data-feather="cpu" class="w-12 h-12 text-nvidia-green"></i> |
| </div> |
| </div> |
| <h1 class="text-4xl md:text-6xl font-bold mb-6">Deploy LLMs on Your NVIDIA RTX</h1> |
| <p class="text-xl text-gray-300 mb-8 max-w-3xl mx-auto"> |
| A comprehensive guide to running large language models on your PC with NVIDIA RTX graphics cards |
| </p> |
| <div class="flex flex-col sm:flex-row justify-center gap-4"> |
| <a href="#guide" class="bg-nvidia-green text-dark-bg px-8 py-3 rounded-lg font-semibold hover:bg-nvidia-green/90 transition-colors"> |
| Get Started |
| </a> |
| <a href="#requirements" class="border border-nvidia-green text-nvidia-green px-8 py-3 rounded-lg font-semibold hover:bg-nvidia-green/10 transition-colors"> |
| Check Requirements |
| </a> |
| </div> |
| </div> |
| </section> |
|
|
| |
| <section id="requirements" class="py-16 px-4"> |
| <div class="container mx-auto"> |
| <h2 class="text-3xl font-bold text-center mb-12" data-aos="fade-up">🖥️ Hardware and Software Requirements</h2> |
| |
| <div class="grid md:grid-cols-2 gap-8 mb-12"> |
| |
| <div class="bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-right"> |
| <div class="flex items-center mb-4"> |
| <i data-feather="hard-drive" class="text-nvidia-green mr-3"></i> |
| <h3 class="text-2xl font-semibold">Hardware</h3> |
| </div> |
| <ul class="space-y-3"> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>GPU:</strong> NVIDIA RTX series (2060, 3070, 4090, etc.)</span> |
| </li> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>VRAM:</strong> 8GB+ recommended (see chart below)</span> |
| </li> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>RAM:</strong> 16GB minimum, 32GB+ recommended</span> |
| </li> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>Storage:</strong> SSD highly recommended</span> |
| </li> |
| </ul> |
| </div> |
|
|
| |
| <div class="bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-left"> |
| <div class="flex items-center mb-4"> |
| <i data-feather="code" class="text-nvidia-green mr-3"></i> |
| <h3 class="text-2xl font-semibold">Software</h3> |
| </div> |
| <ul class="space-y-3"> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>OS:</strong> Windows 10/11 or Linux (Ubuntu)</span> |
| </li> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>Drivers:</strong> Latest NVIDIA drivers</span> |
| </li> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>Python:</strong> 3.10 or 3.11 (Miniconda recommended)</span> |
| </li> |
| <li class="flex items-start"> |
| <i data-feather="check-circle" class="text-nvidia-green mr-2 mt-1"></i> |
| <span><strong>Git:</strong> For downloading repositories</span> |
| </li> |
| </ul> |
| </div> |
| </div> |
|
|
| |
| <div class="bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-up"> |
| <h3 class="text-xl font-semibold mb-6 text-center">VRAM Requirements Guide</h3> |
| <div class="grid md:grid-cols-3 gap-6"> |
| <div class="gpu-card bg-gray-700/50 p-6 rounded-lg text-center"> |
| <div class="w-16 h-16 bg-nvidia-green/20 rounded-full flex items-center justify-center mx-auto mb-4"> |
| <i data-feather="bar-chart" class="text-nvidia-green"></i> |
| </div> |
| <h4 class="font-semibold mb-2">8GB VRAM</h4> |
| <p class="text-sm text-gray-300">7B models with heavy quantization</p> |
| </div> |
| <div class="gpu-card bg-gray-700/50 p-6 rounded-lg text-center"> |
| <div class="w-16 h-16 bg-nvidia-green/30 rounded-full flex items-center justify-center mx-auto mb-4"> |
| <i data-feather="bar-chart-2" class="text-nvidia-green"></i> |
| </div> |
| <h4 class="font-semibold mb-2">12-16GB VRAM</h4> |
| <p class="text-sm text-gray-300">7B to 13B models with good quality</p> |
| </div> |
| <div class="gpu-card bg-gray-700/50 p-6 rounded-lg text-center"> |
| <div class="w-16 h-16 bg-nvidia-green/40 rounded-full flex items-center justify-center mx-auto mb-4"> |
| <i data-feather="bar-chart-2" class="text-nvidia-green"></i> |
| </div> |
| <h4 class="font-semibold mb-2">24GB VRAM</h4> |
| <p class="text-sm text-gray-300">34B to 70B models with less quantization</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| </section> |
|
|
| |
| <section id="guide" class="py-16 px-4 bg-gray-900/50"> |
| <div class="container mx-auto"> |
| <h2 class="text-3xl font-bold text-center mb-12" data-aos="fade-up">⚙️ Step-by-Step Deployment Guide</h2> |
| |
| <div class="space-y-8"> |
| |
| <div class="step-card bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-right"> |
| <div class="flex items-center mb-4"> |
| <div class="w-8 h-8 bg-nvidia-green rounded-full flex items-center justify-center mr-4"> |
| <span class="font-bold text-dark-bg">1</span> |
| </div> |
| <h3 class="text-2xl font-semibold">Install Prerequisites</h3> |
| </div> |
| <div class="grid md:grid-cols-2 gap-6"> |
| <div> |
| <h4 class="font-semibold mb-3 text-nvidia-green">Essential Downloads</h4> |
| <ul class="space-y-2"> |
| <li class="flex items-center"> |
| <i data-feather="download" class="w-4 h-4 mr-2"></i> |
| <a href="https://www.nvidia.com/Download/index.aspx" class="hover:text-nvidia-green">NVIDIA Drivers</a> |
| </li> |
| <li class="flex items-center"> |
| <i data-feather="download" class="w-4 h-4 mr-2"></i> |
| <a href="https://git-scm.com/downloads" class="hover:text-nvidia-green">Git</a> |
| </li> |
| <li class="flex items-center"> |
| <i data-feather="download" class="w-4 h-4 mr-2"></i> |
| <a href="https://docs.anaconda.com/free/miniconda/miniconda-install/" class="hover:text-nvidia-green">Miniconda</a> |
| </li> |
| </ul> |
| </div> |
| <div> |
| <h4 class="font-semibold mb-3 text-nvidia-green">Verification</h4> |
| <p class="text-sm text-gray-300">After installation, verify everything works by opening Anaconda Prompt and running:</p> |
| <div class="code-block mt-2"> |
| <code class="text-green-400">nvidia-smi</code> |
| </div> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="step-card bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-left"> |
| <div class="flex items-center mb-4"> |
| <div class="w-8 h-8 bg-nvidia-green rounded-full flex items-center justify-center mr-4"> |
| <span class="font-bold text-dark-bg">2</span> |
| </div> |
| <h3 class="text-2xl font-semibold">Install Text Generation WebUI</h3> |
| </div> |
| <div class="space-y-4"> |
| <div class="code-block"> |
| <code class="text-green-400">git clone https://github.com/oobabooga/text-generation-webui</code> |
| </div> |
| <div class="code-block"> |
| <code class="text-green-400">cd text-generation-webui</code> |
| </div> |
| <div class="code-block"> |
| <code class="text-green-400">start_windows.bat # For Windows</code><br> |
| <code class="text-green-400">start_linux.sh # For Linux</code> |
| </div> |
| <p class="text-sm text-gray-300">This will take 10-30 minutes as it downloads all dependencies.</p> |
| </div> |
| </div> |
|
|
| |
| <div class="step-card bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-right"> |
| <div class="flex items-center mb-4"> |
| <div class="w-8 h-8 bg-nvidia-green rounded-full flex items-center justify-center mr-4"> |
| <span class="font-bold text-dark-bg">3</span> |
| </div> |
| <h3 class="text-2xl font-semibold">Download an LLM Model</h3> |
| </div> |
| <div class="grid md:grid-cols-2 gap-6"> |
| <div> |
| <h4 class="font-semibold mb-3 text-nvidia-green">Model Formats</h4> |
| <div class="space-y-3"> |
| <div class="bg-gray-700/50 p-3 rounded"> |
| <h5 class="font-semibold text-green-400">GGUF (Recommended)</h5> |
| <p class="text-sm">Flexible, can run on CPU/GPU split</p> |
| </div> |
| <div class="bg-gray-700/50 p-3 rounded"> |
| <h5 class="font-semibold text-blue-400">GPTQ/AWQ</h5> |
| <p class="text-sm">GPU-only, faster if model fits VRAM</p> |
| </div> |
| </div> |
| </div> |
| <div> |
| <h4 class="font-semibold mb-3 text-nvidia-green">Download Command</h4> |
| <div class="code-block"> |
| <code class="text-green-400">python download-model.py TheBloke/Llama-3-8B-Instruct-GGUF</code> |
| </div> |
| <p class="text-sm text-gray-300 mt-2">Visit <a href="https://huggingface.co/TheBloke" class="text-nvidia-green hover:underline">TheBloke's Hugging Face</a> for models</p> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="step-card bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-left"> |
| <div class="flex items-center mb-4"> |
| <div class="w-8 h-8 bg-nvidia-green rounded-full flex items-center justify-center mr-4"> |
| <span class="font-bold text-dark-bg">4</span> |
| </div> |
| <h3 class="text-2xl font-semibold">Load and Run the Model</h3> |
| </div> |
| <div class="space-y-4"> |
| <ol class="list-decimal list-inside space-y-2 pl-4"> |
| <li>Run startup script again</li> |
| <li>Open the provided URL in browser (e.g., http://127.0.0.1:7860)</li> |
| <li>Go to <strong>Model</strong> tab and refresh</li> |
| <li>Select your downloaded model</li> |
| <li>Set <strong>n-gpu-layers</strong> (start with 99 for GGUF)</li> |
| <li>Click <strong>Load</strong></li> |
| <li>Start chatting in the <strong>Text generation</strong> tab! 🚀</li> |
| </ol> |
| </div> |
| </div> |
| </div> |
| </div> |
| </section> |
|
|
| |
| <section id="optimization" class="py-16 px-4"> |
| <div class="container mx-auto"> |
| <h2 class="text-3xl font-bold text-center mb-12" data-aos="fade-up">🚀 Performance Optimization</h2> |
| |
| <div class="grid md:grid-cols-3 gap-8"> |
| <div class="bg-gray-800/50 backdrop-blur-sm rounded-xl p-6 text-center" data-aos="fade-up" data-aos-delay="0"> |
| <div class="w-16 h-16 bg-nvidia-green/20 rounded-full flex items-center justify-center mx-auto mb-4"> |
| <i data-feather="zap" class="text-nvidia-green"></i> |
| </div> |
| <h3 class="font-semibold mb-3">Right Quantization</h3> |
| <p class="text-sm text-gray-300">Use Q4_K_M for 7B/8B models. Perfect balance of size and quality.</p> |
| </div> |
|
|
| <div class="bg-gray-800/50 backdrop-blur-sm rounded-xl p-6 text-center" data-aos="fade-up" data-aos-delay="100"> |
| <div class="w-16 h-16 bg-nvidia-green/20 rounded-full flex items-center justify-center mx-auto mb-4"> |
| <i data-feather="sliders" class="text-nvidia-green"></i> |
| </div> |
| <h3 class="font-semibold mb-3">Context Length</h3> |
| <p class="text-sm text-gray-300">Lower truncation_length from 4096 to 2048 if VRAM is limited.</p> |
| </div> |
|
|
| <div class="bg-gray-800/50 backdrop-blur-sm rounded-xl p-6 text-center" data-aos="fade-up" data-aos-delay="200"> |
| <div class="w-16 h-16 bg-nvidia-green/20 rounded-full flex items-center justify-center mx-auto mb-4"> |
| <i data-feather="layers" class="text-nvidia-green"></i> |
| </div> |
| <h3 class="font-semibold mb-3">GPU Offloading</h3> |
| <p class="text-sm text-gray-300">Use n-gpu-layers slider to balance between VRAM and RAM.</p> |
| </div> |
| </div> |
| </div> |
| </section> |
|
|
| |
| <section id="troubleshooting" class="py-16 px-4 bg-gray-900/50"> |
| <div class="container mx-auto"> |
| <h2 class="text-3xl font-bold text-center mb-12" data-aos="fade-up">🤕 Common Challenges and Solutions</h2> |
| |
| <div class="space-y-6"> |
| |
| <div class="challenge-card bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-up"> |
| <div class="flex items-center mb-4"> |
| <i data-feather="alert-triangle" class="text-red-400 mr-3"></i> |
| <h3 class="text-xl font-semibold">"Out of Memory" Error</h3> |
| </div> |
| <div class="grid md:grid-cols-2 gap-6"> |
| <div> |
| <h4 class="font-semibold mb-2 text-red-400">Causes</h4> |
| <ul class="space-y-1 text-sm"> |
| <li>• Model too large for VRAM</li> |
| <li>• Context length too high</li> |
| <li>• Insufficient GPU layers setting</li> |
| </ul> |
| </div> |
| <div> |
| <h4 class="font-semibold mb-2 text-green-400">Solutions</h4> |
| <ul class="space-y-1 text-sm"> |
| <li>• Use smaller model or higher quantization</li> |
| <li>• Lower n-gpu-layers value</li> |
| <li>• Reduce context length</li> |
| </ul> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="challenge-card bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-up" data-aos-delay="100"> |
| <div class="flex items-center mb-4"> |
| <i data-feather="clock" class="text-yellow-400 mr-3"></i> |
| <h3 class="text-xl font-semibold">Very Slow Inference Speed</h3> |
| </div> |
| <div class="grid md:grid-cols-2 gap-6"> |
| <div> |
| <h4 class="font-semibold mb-2 text-yellow-400">Causes</h4> |
| <ul class="space-y-1 text-sm"> |
| <li>• Model running on CPU instead of GPU</li> |
| <li>• Wrong model format for setup</li> |
| <li>• Outdated drivers</li> |
| </ul> |
| </div> |
| <div> |
| <h4 class="font-semibold mb-2 text-green-400">Solutions</h4> |
| <ul class="space-y-1 text-sm"> |
| <li>• Check GPU layers in console output</li> |
| <li>• Use GPU-native formats (GGUF/GPTQ)</li> |
| <li>• Update NVIDIA drivers</li> |
| </ul> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="challenge-card bg-gray-800/50 backdrop-blur-sm rounded-xl p-6" data-aos="fade-up" data-aos-delay="200"> |
| <div class="flex items-center mb-4"> |
| <i data-feather="x-circle" class="text-red-400 mr-3"></i> |
| <h3 class="text-xl font-semibold">Installation Fails or CUDA Errors</h3> |
| </div> |
| <div class="grid md:grid-cols-2 gap-6"> |
| <div> |
| <h4 class="font-semibold mb-2 text-red-400">Causes</h4> |
| <ul class="space-y-1 text-sm"> |
| <li>• Driver/CUDA/PyTorch version mismatch</li> |
| <li>• Corrupted installation</li> |
| <li>• Missing dependencies</li> |
| </ul> |
| </div> |
| <div> |
| <h4 class="font-semibold mb-2 text-green-400">Solutions</h4> |
| <ul class="space-y-1 text-sm"> |
| <li>• Delete installer_files and repositories folders</li> |
| <li>• Re-run installation script</li> |
| <li>• Ensure latest NVIDIA drivers</li> |
| </ul> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="bg-ai-purple/20 backdrop-blur-sm rounded-xl p-6 border border-ai-purple/30" data-aos="fade-up"> |
| <div class="flex items-center mb-4"> |
| <i data-feather="package" class="text-ai-purple mr-3"></i> |
| <h3 class="text-xl font-semibold">Simpler Alternatives</h3> |
| </div> |
| <p class="mb-4">If the setup is too complex, try these user-friendly alternatives:</p> |
| <div class="grid md:grid-cols-2 gap-4"> |
| <div class="bg-gray-700/50 p-4 rounded"> |
| <h4 class="font-semibold text-ai-purple mb-2">LM Studio</h4> |
| <p class="text-sm">Graphical interface for downloading and running GGUF models</p> |
| </div> |
| <div class="bg-gray-700/50 p-4 rounded"> |
| <h4 class="font-semibold text-ai-purple mb-2">Ollama</h4> |
| <p class="text-sm">Simple command-line tool for local LLM deployment</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| </section> |
|
|
| |
| <footer class="bg-dark-bg py-12 px-4"> |
| <div class="container mx-auto text-center"> |
| <div class="flex justify-center mb-6"> |
| <i data-feather="cpu" class="text-nvidia-green"></i> |
| </div> |
| <p class="text-gray-400 mb-4">Your guide to local LLM deployment on NVIDIA RTX</p> |
| <div class="flex justify-center space-x-6 mb-6"> |
| <a href="#" class="text-gray-400 hover:text-nvidia-green transition-colors"> |
| <i data-feather="github"></i> |
| </a> |
| <a href="#" class="text-gray-400 hover:text-nvidia-green transition-colors"> |
| <i data-feather="twitter"></i> |
| </a> |
| <a href="#" class="text-gray-400 hover:text-nvidia-green transition-colors"> |
| <i data-feather="discord"></i> |
| </a> |
| </div> |
| <p class="text-gray-500 text-sm">© 2024 LLMDeploy Guide. All rights reserved.</p> |
| </div> |
| </footer> |
|
|
| <script> |
| AOS.init({ |
| duration: 800, |
| once: true |
| }); |
| </script> |
| <script> |
| feather.replace(); |
| </script> |
| </body> |
| </html> |
|
|