---
title: Anthropic Compatible API
emoji: 🤖
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
---

# Anthropic-Compatible API

A **production-ready, self-hosted API** that provides full **Anthropic Messages API compatibility** using the Qwen2.5-Coder-7B model with llama.cpp backend.

> **Live Dashboard**: [https://likhonsheikh-anthropic-compatible-api.hf.space](https://likhonsheikh-anthropic-compatible-api.hf.space)

## Features

| Feature | Description |
|---------|-------------|
| **Full Anthropic API** | Complete Messages API compatibility |
| **OpenAI API** | Dual compatibility with OpenAI Chat API |
| **Streaming (SSE)** | Real-time token streaming |
| **Tool Use** | Function calling / tool use support |
| **Extended Thinking** | `<thinking>` block support for reasoning |
| **Request Queue** | Concurrency control with priority |
| **Prompt Caching** | LRU cache for system prompts |
| **Multi-Model** | Hot-swap between models |
| **Live Dashboard** | Built-in web UI with playground |
| **Logs Viewer** | Real-time API logs |

---

## Quick Start

### 1. Claude Code CLI

The easiest way to use this API with Claude Code:

```bash
# Set environment variables
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"

# Run Claude Code
claude "Write a Python script that reads a CSV file"

# Or with explicit model
claude --model qwen2.5-coder-7b "Explain this code"
```

**Persistent Configuration** (add to `~/.bashrc` or `~/.zshrc`):

```bash
# Anthropic-Compatible API Configuration
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
```

### 2. Python SDK

```python
import anthropic

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

# Basic message
message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}]
)
print(message.content[0].text)

# With system prompt
message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    system="You are a helpful coding assistant. Always include comments in your code.",
    messages=[{"role": "user", "content": "Write a function to calculate factorial"}]
)
print(message.content[0].text)
```

### 3. Streaming Response

```python
import anthropic

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

with client.messages.stream(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
```

### 4. Tool Use / Function Calling

```python
import anthropic
import json

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

if message.stop_reason == "tool_use":
    for block in message.content:
        if block.type == "tool_use":
            print(f"Tool: {block.name}")
            print(f"Input: {json.dumps(block.input, indent=2)}")
```

### 5. Extended Thinking

```python
import anthropic

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=2048,
    thinking={"type": "enabled", "budget_tokens": 1024},
    messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}]
)

for block in message.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== ANSWER ===")
        print(block.text)
```

### 6. TypeScript/JavaScript

```typescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'any-key',
  baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic'
});

const message = await client.messages.create({
  model: 'qwen2.5-coder-7b',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(message.content[0].text);
```

### 7. cURL

```bash
curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \
  -H "Content-Type: application/json" \
  -H "x-api-key: any-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "qwen2.5-coder-7b",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

### 8. OpenAI SDK (Alternative)

```python
from openai import OpenAI

client = OpenAI(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1"
)

response = client.chat.completions.create(
    model="qwen2.5-coder-7b",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=1024
)
print(response.choices[0].message.content)
```

---

## API Reference

### Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | Dashboard with status & playground |
| `GET` | `/health` | Health check with queue/cache stats |
| `GET` | `/logs?lines=100` | View API logs |
| `GET` | `/queue/status` | Request queue statistics |
| `GET` | `/models/status` | Loaded models information |
| `POST` | `/models/{id}/load` | Manually load a model |
| `POST` | `/models/{id}/unload` | Unload a model |
| `GET` | `/anthropic/v1/models` | List models (Anthropic format) |
| `POST` | `/anthropic/v1/messages` | Create message (Anthropic API) |
| `POST` | `/anthropic/v1/messages/count_tokens` | Count tokens |
| `GET` | `/v1/models` | List models (OpenAI format) |
| `POST` | `/v1/chat/completions` | Chat completion (OpenAI API) |

### Request Format

```json
{
  "model": "qwen2.5-coder-7b",
  "max_tokens": 1024,
  "messages": [{"role": "user", "content": "Hello!"}],
  "system": "You are a helpful assistant.",
  "temperature": 0.7,
  "stream": false,
  "tools": [...],
  "thinking": {"type": "enabled", "budget_tokens": 1024}
}
```

### Response Format

```json
{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "qwen2.5-coder-7b",
  "stop_reason": "end_turn",
  "usage": {"input_tokens": 10, "output_tokens": 25}
}
```

---

## Model Info

| Property | Value |
|----------|-------|
| **Model** | Qwen2.5-Coder-7B-Instruct |
| **Format** | GGUF (Q4_K_M quantization) |
| **Parameters** | 7 Billion |
| **Context Length** | 8,192 tokens |
| **Backend** | llama.cpp |
| **Optimized For** | Code, tool use, agent workflows |

---

## Troubleshooting

| Issue | Solution |
|-------|----------|
| Connection Timeout | Space may be sleeping. First request wakes it (~30s) |
| 503 Queue Full | Too many requests. Retry in a few seconds |
| Slow Response | CPU-based, expect ~10-30 tokens/second |
| Tool Use Issues | Ensure valid JSON schema |

---

## License

Apache 2.0 | Built with llama.cpp + FastAPI by Matrix Agent