--- title: DeepSeek Coder V2 Lite 16B emoji: 💻 colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.0.0 app_file: app.py pinned: false --- # 🚀 o87Dev - Maximum Capacity Deployment **Strategy:** Deploy the largest viable model (`DeepSeek-Coder-V2-Lite-Instruct-16B-Q4_K_M`) on Hugging Face's free CPU tier. ## ⚙️ Technical Details - **Model:** DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf (10.4GB) - **Quantization:** Q4_K_M (Optimal quality/size for free tier) - **Loader:** `llama-cpp-python` (CPU optimized) - **Context:** 2048 tokens (max for free tier stability) ## 📊 Performance Expectations - **First load:** ~60-120 seconds (model loads from disk) - **Inference speed:** ~2-5 tokens/second on CPU - **Memory usage:** ~12-14GB of 16GB available ## 🎯 Usage Tips 1. First request triggers model load (be patient) 2. Keep prompts under 500 tokens for best results 3. Use temperature 0.7-0.9 for creative tasks 4. Monitor memory usage in Space logs ## 🔗 Integration This Space serves as the primary AI endpoint for the o87Dev local API server.