metadata
title: Anthropic Compatible API
emoji: 🤖
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
Anthropic-Compatible API
A production-ready, self-hosted API that provides full Anthropic Messages API compatibility using the Qwen2.5-Coder-7B model with llama.cpp backend.
Live Dashboard: https://likhonsheikh-anthropic-compatible-api.hf.space
Features
| Feature | Description |
|---|---|
| Full Anthropic API | Complete Messages API compatibility |
| OpenAI API | Dual compatibility with OpenAI Chat API |
| Streaming (SSE) | Real-time token streaming |
| Tool Use | Function calling / tool use support |
| Extended Thinking | <thinking> block support for reasoning |
| Request Queue | Concurrency control with priority |
| Prompt Caching | LRU cache for system prompts |
| Multi-Model | Hot-swap between models |
| Live Dashboard | Built-in web UI with playground |
| Logs Viewer | Real-time API logs |
Quick Start
1. Claude Code CLI
The easiest way to use this API with Claude Code:
# Set environment variables
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
# Run Claude Code
claude "Write a Python script that reads a CSV file"
# Or with explicit model
claude --model qwen2.5-coder-7b "Explain this code"
Persistent Configuration (add to ~/.bashrc or ~/.zshrc):
# Anthropic-Compatible API Configuration
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
2. Python SDK
import anthropic
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
# Basic message
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}]
)
print(message.content[0].text)
# With system prompt
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=1024,
system="You are a helpful coding assistant. Always include comments in your code.",
messages=[{"role": "user", "content": "Write a function to calculate factorial"}]
)
print(message.content[0].text)
3. Streaming Response
import anthropic
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
with client.messages.stream(
model="qwen2.5-coder-7b",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
4. Tool Use / Function Calling
import anthropic
import json
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
]
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
if message.stop_reason == "tool_use":
for block in message.content:
if block.type == "tool_use":
print(f"Tool: {block.name}")
print(f"Input: {json.dumps(block.input, indent=2)}")
5. Extended Thinking
import anthropic
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=2048,
thinking={"type": "enabled", "budget_tokens": 1024},
messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}]
)
for block in message.content:
if block.type == "thinking":
print("=== THINKING ===")
print(block.thinking)
elif block.type == "text":
print("=== ANSWER ===")
print(block.text)
6. TypeScript/JavaScript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: 'any-key',
baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic'
});
const message = await client.messages.create({
model: 'qwen2.5-coder-7b',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(message.content[0].text);
7. cURL
curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \
-H "Content-Type: application/json" \
-H "x-api-key: any-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen2.5-coder-7b",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Hello!"}]
}'
8. OpenAI SDK (Alternative)
from openai import OpenAI
client = OpenAI(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1"
)
response = client.chat.completions.create(
model="qwen2.5-coder-7b",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=1024
)
print(response.choices[0].message.content)
API Reference
Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Dashboard with status & playground |
GET |
/health |
Health check with queue/cache stats |
GET |
/logs?lines=100 |
View API logs |
GET |
/queue/status |
Request queue statistics |
GET |
/models/status |
Loaded models information |
POST |
/models/{id}/load |
Manually load a model |
POST |
/models/{id}/unload |
Unload a model |
GET |
/anthropic/v1/models |
List models (Anthropic format) |
POST |
/anthropic/v1/messages |
Create message (Anthropic API) |
POST |
/anthropic/v1/messages/count_tokens |
Count tokens |
GET |
/v1/models |
List models (OpenAI format) |
POST |
/v1/chat/completions |
Chat completion (OpenAI API) |
Request Format
{
"model": "qwen2.5-coder-7b",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}],
"system": "You are a helpful assistant.",
"temperature": 0.7,
"stream": false,
"tools": [...],
"thinking": {"type": "enabled", "budget_tokens": 1024}
}
Response Format
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "qwen2.5-coder-7b",
"stop_reason": "end_turn",
"usage": {"input_tokens": 10, "output_tokens": 25}
}
Model Info
| Property | Value |
|---|---|
| Model | Qwen2.5-Coder-7B-Instruct |
| Format | GGUF (Q4_K_M quantization) |
| Parameters | 7 Billion |
| Context Length | 8,192 tokens |
| Backend | llama.cpp |
| Optimized For | Code, tool use, agent workflows |
Troubleshooting
| Issue | Solution |
|---|---|
| Connection Timeout | Space may be sleeping. First request wakes it (~30s) |
| 503 Queue Full | Too many requests. Retry in a few seconds |
| Slow Response | CPU-based, expect ~10-30 tokens/second |
| Tool Use Issues | Ensure valid JSON schema |
License
Apache 2.0 | Built with llama.cpp + FastAPI by Matrix Agent