| title: DeepSeek Coder V2 Lite 16B | |
| emoji: 💻 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 4.0.0 | |
| app_file: app.py | |
| pinned: false | |
| # 🚀 o87Dev - Maximum Capacity Deployment | |
| **Strategy:** Deploy the largest viable model (`DeepSeek-Coder-V2-Lite-Instruct-16B-Q4_K_M`) on Hugging Face's free CPU tier. | |
| ## ⚙️ Technical Details | |
| - **Model:** DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf (10.4GB) | |
| - **Quantization:** Q4_K_M (Optimal quality/size for free tier) | |
| - **Loader:** `llama-cpp-python` (CPU optimized) | |
| - **Context:** 2048 tokens (max for free tier stability) | |
| ## 📊 Performance Expectations | |
| - **First load:** ~60-120 seconds (model loads from disk) | |
| - **Inference speed:** ~2-5 tokens/second on CPU | |
| - **Memory usage:** ~12-14GB of 16GB available | |
| ## 🎯 Usage Tips | |
| 1. First request triggers model load (be patient) | |
| 2. Keep prompts under 500 tokens for best results | |
| 3. Use temperature 0.7-0.9 for creative tasks | |
| 4. Monitor memory usage in Space logs | |
| ## 🔗 Integration | |
| This Space serves as the primary AI endpoint for the o87Dev local API server. | |