Matrix Agent
Add frontend dashboard, comprehensive docs, and enhanced logging v3.1
8910367
---
title: Anthropic Compatible API
emoji: 🤖
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
---
# Anthropic-Compatible API
A **production-ready, self-hosted API** that provides full **Anthropic Messages API compatibility** using the Qwen2.5-Coder-7B model with llama.cpp backend.
> **Live Dashboard**: [https://likhonsheikh-anthropic-compatible-api.hf.space](https://likhonsheikh-anthropic-compatible-api.hf.space)
## Features
| Feature | Description |
|---------|-------------|
| **Full Anthropic API** | Complete Messages API compatibility |
| **OpenAI API** | Dual compatibility with OpenAI Chat API |
| **Streaming (SSE)** | Real-time token streaming |
| **Tool Use** | Function calling / tool use support |
| **Extended Thinking** | `<thinking>` block support for reasoning |
| **Request Queue** | Concurrency control with priority |
| **Prompt Caching** | LRU cache for system prompts |
| **Multi-Model** | Hot-swap between models |
| **Live Dashboard** | Built-in web UI with playground |
| **Logs Viewer** | Real-time API logs |
---
## Quick Start
### 1. Claude Code CLI
The easiest way to use this API with Claude Code:
```bash
# Set environment variables
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
# Run Claude Code
claude "Write a Python script that reads a CSV file"
# Or with explicit model
claude --model qwen2.5-coder-7b "Explain this code"
```
**Persistent Configuration** (add to `~/.bashrc` or `~/.zshrc`):
```bash
# Anthropic-Compatible API Configuration
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
```
### 2. Python SDK
```python
import anthropic
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
# Basic message
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}]
)
print(message.content[0].text)
# With system prompt
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=1024,
system="You are a helpful coding assistant. Always include comments in your code.",
messages=[{"role": "user", "content": "Write a function to calculate factorial"}]
)
print(message.content[0].text)
```
### 3. Streaming Response
```python
import anthropic
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
with client.messages.stream(
model="qwen2.5-coder-7b",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
```
### 4. Tool Use / Function Calling
```python
import anthropic
import json
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
]
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
if message.stop_reason == "tool_use":
for block in message.content:
if block.type == "tool_use":
print(f"Tool: {block.name}")
print(f"Input: {json.dumps(block.input, indent=2)}")
```
### 5. Extended Thinking
```python
import anthropic
client = anthropic.Anthropic(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)
message = client.messages.create(
model="qwen2.5-coder-7b",
max_tokens=2048,
thinking={"type": "enabled", "budget_tokens": 1024},
messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}]
)
for block in message.content:
if block.type == "thinking":
print("=== THINKING ===")
print(block.thinking)
elif block.type == "text":
print("=== ANSWER ===")
print(block.text)
```
### 6. TypeScript/JavaScript
```typescript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: 'any-key',
baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic'
});
const message = await client.messages.create({
model: 'qwen2.5-coder-7b',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(message.content[0].text);
```
### 7. cURL
```bash
curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \
-H "Content-Type: application/json" \
-H "x-api-key: any-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen2.5-coder-7b",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
### 8. OpenAI SDK (Alternative)
```python
from openai import OpenAI
client = OpenAI(
api_key="any-key",
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1"
)
response = client.chat.completions.create(
model="qwen2.5-coder-7b",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=1024
)
print(response.choices[0].message.content)
```
---
## API Reference
### Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | Dashboard with status & playground |
| `GET` | `/health` | Health check with queue/cache stats |
| `GET` | `/logs?lines=100` | View API logs |
| `GET` | `/queue/status` | Request queue statistics |
| `GET` | `/models/status` | Loaded models information |
| `POST` | `/models/{id}/load` | Manually load a model |
| `POST` | `/models/{id}/unload` | Unload a model |
| `GET` | `/anthropic/v1/models` | List models (Anthropic format) |
| `POST` | `/anthropic/v1/messages` | Create message (Anthropic API) |
| `POST` | `/anthropic/v1/messages/count_tokens` | Count tokens |
| `GET` | `/v1/models` | List models (OpenAI format) |
| `POST` | `/v1/chat/completions` | Chat completion (OpenAI API) |
### Request Format
```json
{
"model": "qwen2.5-coder-7b",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}],
"system": "You are a helpful assistant.",
"temperature": 0.7,
"stream": false,
"tools": [...],
"thinking": {"type": "enabled", "budget_tokens": 1024}
}
```
### Response Format
```json
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "qwen2.5-coder-7b",
"stop_reason": "end_turn",
"usage": {"input_tokens": 10, "output_tokens": 25}
}
```
---
## Model Info
| Property | Value |
|----------|-------|
| **Model** | Qwen2.5-Coder-7B-Instruct |
| **Format** | GGUF (Q4_K_M quantization) |
| **Parameters** | 7 Billion |
| **Context Length** | 8,192 tokens |
| **Backend** | llama.cpp |
| **Optimized For** | Code, tool use, agent workflows |
---
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Connection Timeout | Space may be sleeping. First request wakes it (~30s) |
| 503 Queue Full | Too many requests. Retry in a few seconds |
| Slow Response | CPU-based, expect ~10-30 tokens/second |
| Tool Use Issues | Ensure valid JSON schema |
---
## License
Apache 2.0 | Built with llama.cpp + FastAPI by Matrix Agent