How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for truegleai/deepseek-coder-api to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for truegleai/deepseek-coder-api to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for truegleai/deepseek-coder-api to start chatting
Quick Links

🚀 o87Dev - Maximum Capacity Deployment

Strategy: Deploy the largest viable model (DeepSeek-Coder-V2-Lite-Instruct-16B-Q4_K_M) on Hugging Face's free CPU tier.

⚙️ Technical Details

  • Model: DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf (10.4GB)
  • Quantization: Q4_K_M (Optimal quality/size for free tier)
  • Loader: llama-cpp-python (CPU optimized)
  • Context: 2048 tokens (max for free tier stability)

📊 Performance Expectations

  • First load: ~60-120 seconds (model loads from disk)
  • Inference speed: ~2-5 tokens/second on CPU
  • Memory usage: ~12-14GB of 16GB available

🎯 Usage Tips

  1. First request triggers model load (be patient)
  2. Keep prompts under 500 tokens for best results
  3. Use temperature 0.7-0.9 for creative tasks
  4. Monitor memory usage in Space logs

🔗 Integration

This Space serves as the primary AI endpoint for the o87Dev local API server.

Downloads last month
8
GGUF
Model size
16B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support