File size: 7,825 Bytes
dce0160
7989a2d
 
 
 
dce0160
 
7989a2d
dce0160
 
7989a2d
 
8910367
 
 
7989a2d
 
 
8910367
 
 
 
 
 
 
 
 
 
 
 
7989a2d
8910367
7989a2d
8910367
 
 
 
 
7989a2d
 
8910367
 
 
 
 
 
 
 
 
7989a2d
 
8910367
 
7989a2d
8910367
 
 
7989a2d
 
8910367
7989a2d
 
 
 
 
 
8910367
7989a2d
 
8910367
7989a2d
7ef800a
8910367
 
7989a2d
 
8910367
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7989a2d
 
8910367
 
7989a2d
 
 
 
 
8910367
7989a2d
 
 
7ef800a
8910367
7989a2d
 
8910367
7989a2d
 
 
8910367
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7989a2d
 
8910367
 
 
 
 
 
 
 
 
 
7989a2d
8910367
7989a2d
8910367
 
 
 
 
 
7989a2d
 
8910367
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
---
title: Anthropic Compatible API
emoji: 🤖
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
---

# Anthropic-Compatible API

A **production-ready, self-hosted API** that provides full **Anthropic Messages API compatibility** using the Qwen2.5-Coder-7B model with llama.cpp backend.

> **Live Dashboard**: [https://likhonsheikh-anthropic-compatible-api.hf.space](https://likhonsheikh-anthropic-compatible-api.hf.space)

## Features

| Feature | Description |
|---------|-------------|
| **Full Anthropic API** | Complete Messages API compatibility |
| **OpenAI API** | Dual compatibility with OpenAI Chat API |
| **Streaming (SSE)** | Real-time token streaming |
| **Tool Use** | Function calling / tool use support |
| **Extended Thinking** | `<thinking>` block support for reasoning |
| **Request Queue** | Concurrency control with priority |
| **Prompt Caching** | LRU cache for system prompts |
| **Multi-Model** | Hot-swap between models |
| **Live Dashboard** | Built-in web UI with playground |
| **Logs Viewer** | Real-time API logs |

---

## Quick Start

### 1. Claude Code CLI

The easiest way to use this API with Claude Code:

```bash
# Set environment variables
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"

# Run Claude Code
claude "Write a Python script that reads a CSV file"

# Or with explicit model
claude --model qwen2.5-coder-7b "Explain this code"
```

**Persistent Configuration** (add to `~/.bashrc` or `~/.zshrc`):

```bash
# Anthropic-Compatible API Configuration
export ANTHROPIC_API_KEY="any-key"
export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
```

### 2. Python SDK

```python
import anthropic

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

# Basic message
message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}]
)
print(message.content[0].text)

# With system prompt
message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    system="You are a helpful coding assistant. Always include comments in your code.",
    messages=[{"role": "user", "content": "Write a function to calculate factorial"}]
)
print(message.content[0].text)
```

### 3. Streaming Response

```python
import anthropic

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

with client.messages.stream(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
```

### 4. Tool Use / Function Calling

```python
import anthropic
import json

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

if message.stop_reason == "tool_use":
    for block in message.content:
        if block.type == "tool_use":
            print(f"Tool: {block.name}")
            print(f"Input: {json.dumps(block.input, indent=2)}")
```

### 5. Extended Thinking

```python
import anthropic

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
)

message = client.messages.create(
    model="qwen2.5-coder-7b",
    max_tokens=2048,
    thinking={"type": "enabled", "budget_tokens": 1024},
    messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}]
)

for block in message.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== ANSWER ===")
        print(block.text)
```

### 6. TypeScript/JavaScript

```typescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'any-key',
  baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic'
});

const message = await client.messages.create({
  model: 'qwen2.5-coder-7b',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(message.content[0].text);
```

### 7. cURL

```bash
curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \
  -H "Content-Type: application/json" \
  -H "x-api-key: any-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "qwen2.5-coder-7b",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

### 8. OpenAI SDK (Alternative)

```python
from openai import OpenAI

client = OpenAI(
    api_key="any-key",
    base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1"
)

response = client.chat.completions.create(
    model="qwen2.5-coder-7b",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=1024
)
print(response.choices[0].message.content)
```

---

## API Reference

### Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | Dashboard with status & playground |
| `GET` | `/health` | Health check with queue/cache stats |
| `GET` | `/logs?lines=100` | View API logs |
| `GET` | `/queue/status` | Request queue statistics |
| `GET` | `/models/status` | Loaded models information |
| `POST` | `/models/{id}/load` | Manually load a model |
| `POST` | `/models/{id}/unload` | Unload a model |
| `GET` | `/anthropic/v1/models` | List models (Anthropic format) |
| `POST` | `/anthropic/v1/messages` | Create message (Anthropic API) |
| `POST` | `/anthropic/v1/messages/count_tokens` | Count tokens |
| `GET` | `/v1/models` | List models (OpenAI format) |
| `POST` | `/v1/chat/completions` | Chat completion (OpenAI API) |

### Request Format

```json
{
  "model": "qwen2.5-coder-7b",
  "max_tokens": 1024,
  "messages": [{"role": "user", "content": "Hello!"}],
  "system": "You are a helpful assistant.",
  "temperature": 0.7,
  "stream": false,
  "tools": [...],
  "thinking": {"type": "enabled", "budget_tokens": 1024}
}
```

### Response Format

```json
{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "qwen2.5-coder-7b",
  "stop_reason": "end_turn",
  "usage": {"input_tokens": 10, "output_tokens": 25}
}
```

---

## Model Info

| Property | Value |
|----------|-------|
| **Model** | Qwen2.5-Coder-7B-Instruct |
| **Format** | GGUF (Q4_K_M quantization) |
| **Parameters** | 7 Billion |
| **Context Length** | 8,192 tokens |
| **Backend** | llama.cpp |
| **Optimized For** | Code, tool use, agent workflows |

---

## Troubleshooting

| Issue | Solution |
|-------|----------|
| Connection Timeout | Space may be sleeping. First request wakes it (~30s) |
| 503 Queue Full | Too many requests. Retry in a few seconds |
| Slow Response | CPU-based, expect ~10-30 tokens/second |
| Tool Use Issues | Ensure valid JSON schema |

---

## License

Apache 2.0 | Built with llama.cpp + FastAPI by Matrix Agent