|
|
--- |
|
|
title: Anthropic Compatible API |
|
|
emoji: 🤖 |
|
|
colorFrom: purple |
|
|
colorTo: blue |
|
|
sdk: docker |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# Anthropic-Compatible API |
|
|
|
|
|
A **production-ready, self-hosted API** that provides full **Anthropic Messages API compatibility** using the Qwen2.5-Coder-7B model with llama.cpp backend. |
|
|
|
|
|
> **Live Dashboard**: [https://likhonsheikh-anthropic-compatible-api.hf.space](https://likhonsheikh-anthropic-compatible-api.hf.space) |
|
|
|
|
|
## Features |
|
|
|
|
|
| Feature | Description | |
|
|
|---------|-------------| |
|
|
| **Full Anthropic API** | Complete Messages API compatibility | |
|
|
| **OpenAI API** | Dual compatibility with OpenAI Chat API | |
|
|
| **Streaming (SSE)** | Real-time token streaming | |
|
|
| **Tool Use** | Function calling / tool use support | |
|
|
| **Extended Thinking** | `<thinking>` block support for reasoning | |
|
|
| **Request Queue** | Concurrency control with priority | |
|
|
| **Prompt Caching** | LRU cache for system prompts | |
|
|
| **Multi-Model** | Hot-swap between models | |
|
|
| **Live Dashboard** | Built-in web UI with playground | |
|
|
| **Logs Viewer** | Real-time API logs | |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### 1. Claude Code CLI |
|
|
|
|
|
The easiest way to use this API with Claude Code: |
|
|
|
|
|
```bash |
|
|
# Set environment variables |
|
|
export ANTHROPIC_API_KEY="any-key" |
|
|
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" |
|
|
|
|
|
# Run Claude Code |
|
|
claude "Write a Python script that reads a CSV file" |
|
|
|
|
|
# Or with explicit model |
|
|
claude --model qwen2.5-coder-7b "Explain this code" |
|
|
``` |
|
|
|
|
|
**Persistent Configuration** (add to `~/.bashrc` or `~/.zshrc`): |
|
|
|
|
|
```bash |
|
|
# Anthropic-Compatible API Configuration |
|
|
export ANTHROPIC_API_KEY="any-key" |
|
|
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" |
|
|
``` |
|
|
|
|
|
### 2. Python SDK |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
|
|
|
client = anthropic.Anthropic( |
|
|
api_key="any-key", |
|
|
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" |
|
|
) |
|
|
|
|
|
# Basic message |
|
|
message = client.messages.create( |
|
|
model="qwen2.5-coder-7b", |
|
|
max_tokens=1024, |
|
|
messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}] |
|
|
) |
|
|
print(message.content[0].text) |
|
|
|
|
|
# With system prompt |
|
|
message = client.messages.create( |
|
|
model="qwen2.5-coder-7b", |
|
|
max_tokens=1024, |
|
|
system="You are a helpful coding assistant. Always include comments in your code.", |
|
|
messages=[{"role": "user", "content": "Write a function to calculate factorial"}] |
|
|
) |
|
|
print(message.content[0].text) |
|
|
``` |
|
|
|
|
|
### 3. Streaming Response |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
|
|
|
client = anthropic.Anthropic( |
|
|
api_key="any-key", |
|
|
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" |
|
|
) |
|
|
|
|
|
with client.messages.stream( |
|
|
model="qwen2.5-coder-7b", |
|
|
max_tokens=1024, |
|
|
messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}] |
|
|
) as stream: |
|
|
for text in stream.text_stream: |
|
|
print(text, end="", flush=True) |
|
|
``` |
|
|
|
|
|
### 4. Tool Use / Function Calling |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
import json |
|
|
|
|
|
client = anthropic.Anthropic( |
|
|
api_key="any-key", |
|
|
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" |
|
|
) |
|
|
|
|
|
tools = [ |
|
|
{ |
|
|
"name": "get_weather", |
|
|
"description": "Get the current weather for a location", |
|
|
"input_schema": { |
|
|
"type": "object", |
|
|
"properties": { |
|
|
"location": {"type": "string", "description": "City name"}, |
|
|
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} |
|
|
}, |
|
|
"required": ["location"] |
|
|
} |
|
|
} |
|
|
] |
|
|
|
|
|
message = client.messages.create( |
|
|
model="qwen2.5-coder-7b", |
|
|
max_tokens=1024, |
|
|
tools=tools, |
|
|
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}] |
|
|
) |
|
|
|
|
|
if message.stop_reason == "tool_use": |
|
|
for block in message.content: |
|
|
if block.type == "tool_use": |
|
|
print(f"Tool: {block.name}") |
|
|
print(f"Input: {json.dumps(block.input, indent=2)}") |
|
|
``` |
|
|
|
|
|
### 5. Extended Thinking |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
|
|
|
client = anthropic.Anthropic( |
|
|
api_key="any-key", |
|
|
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" |
|
|
) |
|
|
|
|
|
message = client.messages.create( |
|
|
model="qwen2.5-coder-7b", |
|
|
max_tokens=2048, |
|
|
thinking={"type": "enabled", "budget_tokens": 1024}, |
|
|
messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}] |
|
|
) |
|
|
|
|
|
for block in message.content: |
|
|
if block.type == "thinking": |
|
|
print("=== THINKING ===") |
|
|
print(block.thinking) |
|
|
elif block.type == "text": |
|
|
print("=== ANSWER ===") |
|
|
print(block.text) |
|
|
``` |
|
|
|
|
|
### 6. TypeScript/JavaScript |
|
|
|
|
|
```typescript |
|
|
import Anthropic from '@anthropic-ai/sdk'; |
|
|
|
|
|
const client = new Anthropic({ |
|
|
apiKey: 'any-key', |
|
|
baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic' |
|
|
}); |
|
|
|
|
|
const message = await client.messages.create({ |
|
|
model: 'qwen2.5-coder-7b', |
|
|
max_tokens: 1024, |
|
|
messages: [{ role: 'user', content: 'Hello!' }] |
|
|
}); |
|
|
|
|
|
console.log(message.content[0].text); |
|
|
``` |
|
|
|
|
|
### 7. cURL |
|
|
|
|
|
```bash |
|
|
curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-H "x-api-key: any-key" \ |
|
|
-H "anthropic-version: 2023-06-01" \ |
|
|
-d '{ |
|
|
"model": "qwen2.5-coder-7b", |
|
|
"max_tokens": 256, |
|
|
"messages": [{"role": "user", "content": "Hello!"}] |
|
|
}' |
|
|
``` |
|
|
|
|
|
### 8. OpenAI SDK (Alternative) |
|
|
|
|
|
```python |
|
|
from openai import OpenAI |
|
|
|
|
|
client = OpenAI( |
|
|
api_key="any-key", |
|
|
base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1" |
|
|
) |
|
|
|
|
|
response = client.chat.completions.create( |
|
|
model="qwen2.5-coder-7b", |
|
|
messages=[{"role": "user", "content": "Hello!"}], |
|
|
max_tokens=1024 |
|
|
) |
|
|
print(response.choices[0].message.content) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## API Reference |
|
|
|
|
|
### Endpoints |
|
|
|
|
|
| Method | Endpoint | Description | |
|
|
|--------|----------|-------------| |
|
|
| `GET` | `/` | Dashboard with status & playground | |
|
|
| `GET` | `/health` | Health check with queue/cache stats | |
|
|
| `GET` | `/logs?lines=100` | View API logs | |
|
|
| `GET` | `/queue/status` | Request queue statistics | |
|
|
| `GET` | `/models/status` | Loaded models information | |
|
|
| `POST` | `/models/{id}/load` | Manually load a model | |
|
|
| `POST` | `/models/{id}/unload` | Unload a model | |
|
|
| `GET` | `/anthropic/v1/models` | List models (Anthropic format) | |
|
|
| `POST` | `/anthropic/v1/messages` | Create message (Anthropic API) | |
|
|
| `POST` | `/anthropic/v1/messages/count_tokens` | Count tokens | |
|
|
| `GET` | `/v1/models` | List models (OpenAI format) | |
|
|
| `POST` | `/v1/chat/completions` | Chat completion (OpenAI API) | |
|
|
|
|
|
### Request Format |
|
|
|
|
|
```json |
|
|
{ |
|
|
"model": "qwen2.5-coder-7b", |
|
|
"max_tokens": 1024, |
|
|
"messages": [{"role": "user", "content": "Hello!"}], |
|
|
"system": "You are a helpful assistant.", |
|
|
"temperature": 0.7, |
|
|
"stream": false, |
|
|
"tools": [...], |
|
|
"thinking": {"type": "enabled", "budget_tokens": 1024} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Response Format |
|
|
|
|
|
```json |
|
|
{ |
|
|
"id": "msg_abc123", |
|
|
"type": "message", |
|
|
"role": "assistant", |
|
|
"content": [{"type": "text", "text": "Hello!"}], |
|
|
"model": "qwen2.5-coder-7b", |
|
|
"stop_reason": "end_turn", |
|
|
"usage": {"input_tokens": 10, "output_tokens": 25} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Info |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Model** | Qwen2.5-Coder-7B-Instruct | |
|
|
| **Format** | GGUF (Q4_K_M quantization) | |
|
|
| **Parameters** | 7 Billion | |
|
|
| **Context Length** | 8,192 tokens | |
|
|
| **Backend** | llama.cpp | |
|
|
| **Optimized For** | Code, tool use, agent workflows | |
|
|
|
|
|
--- |
|
|
|
|
|
## Troubleshooting |
|
|
|
|
|
| Issue | Solution | |
|
|
|-------|----------| |
|
|
| Connection Timeout | Space may be sleeping. First request wakes it (~30s) | |
|
|
| 503 Queue Full | Too many requests. Retry in a few seconds | |
|
|
| Slow Response | CPU-based, expect ~10-30 tokens/second | |
|
|
| Tool Use Issues | Ensure valid JSON schema | |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 | Built with llama.cpp + FastAPI by Matrix Agent |
|
|
|