---
title: DeepSeek Coder V2 Lite 16B
emoji: 💻
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
---

# 🚀 o87Dev - Maximum Capacity Deployment

**Strategy:** Deploy the largest viable model (`DeepSeek-Coder-V2-Lite-Instruct-16B-Q4_K_M`) on Hugging Face's free CPU tier.

## ⚙️ Technical Details
- **Model:** DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf (10.4GB)
- **Quantization:** Q4_K_M (Optimal quality/size for free tier)
- **Loader:** `llama-cpp-python` (CPU optimized)
- **Context:** 2048 tokens (max for free tier stability)

## 📊 Performance Expectations
- **First load:** ~60-120 seconds (model loads from disk)
- **Inference speed:** ~2-5 tokens/second on CPU
- **Memory usage:** ~12-14GB of 16GB available

## 🎯 Usage Tips
1. First request triggers model load (be patient)
2. Keep prompts under 500 tokens for best results
3. Use temperature 0.7-0.9 for creative tasks
4. Monitor memory usage in Space logs

## 🔗 Integration
This Space serves as the primary AI endpoint for the o87Dev local API server.