deepseek-coder-api / README.md
truegleai's picture
Upload folder using huggingface_hub
a122c4c verified
metadata
title: DeepSeek Coder V2 Lite 16B
emoji: 💻
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false

🚀 o87Dev - Maximum Capacity Deployment

Strategy: Deploy the largest viable model (DeepSeek-Coder-V2-Lite-Instruct-16B-Q4_K_M) on Hugging Face's free CPU tier.

⚙️ Technical Details

  • Model: DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf (10.4GB)
  • Quantization: Q4_K_M (Optimal quality/size for free tier)
  • Loader: llama-cpp-python (CPU optimized)
  • Context: 2048 tokens (max for free tier stability)

📊 Performance Expectations

  • First load: ~60-120 seconds (model loads from disk)
  • Inference speed: ~2-5 tokens/second on CPU
  • Memory usage: ~12-14GB of 16GB available

🎯 Usage Tips

  1. First request triggers model load (be patient)
  2. Keep prompts under 500 tokens for best results
  3. Use temperature 0.7-0.9 for creative tasks
  4. Monitor memory usage in Space logs

🔗 Integration

This Space serves as the primary AI endpoint for the o87Dev local API server.