anthropic-compatible-api

Sleeping

App Files Files Community

anthropic-compatible-api / README.md

Matrix Agent

Add frontend dashboard, comprehensive docs, and enhanced logging v3.1

8910367 about 2 months ago

preview code

raw

history blame contribute delete

7.83 kB

	---
	title: Anthropic Compatible API
	emoji: 🤖
	colorFrom: purple
	colorTo: blue
	sdk: docker
	pinned: false
	license: apache-2.0
	---

	# Anthropic-Compatible API

	A production-ready, self-hosted API that provides full Anthropic Messages API compatibility using the Qwen2.5-Coder-7B model with llama.cpp backend.

	> Live Dashboard: [https://likhonsheikh-anthropic-compatible-api.hf.space](https://likhonsheikh-anthropic-compatible-api.hf.space)

	## Features

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| Full Anthropic API \| Complete Messages API compatibility \|
	\| OpenAI API \| Dual compatibility with OpenAI Chat API \|
	\| Streaming (SSE) \| Real-time token streaming \|
	\| Tool Use \| Function calling / tool use support \|
	\| Extended Thinking \| `<thinking>` block support for reasoning \|
	\| Request Queue \| Concurrency control with priority \|
	\| Prompt Caching \| LRU cache for system prompts \|
	\| Multi-Model \| Hot-swap between models \|
	\| Live Dashboard \| Built-in web UI with playground \|
	\| Logs Viewer \| Real-time API logs \|

	---

	## Quick Start

	### 1. Claude Code CLI

	The easiest way to use this API with Claude Code:

	```bash
	# Set environment variables
	export ANTHROPIC_API_KEY="any-key"
	export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"

	# Run Claude Code
	claude "Write a Python script that reads a CSV file"

	# Or with explicit model
	claude --model qwen2.5-coder-7b "Explain this code"
	```

	Persistent Configuration (add to `~/.bashrc` or `~/.zshrc`):

	```bash
	# Anthropic-Compatible API Configuration
	export ANTHROPIC_API_KEY="any-key"
	export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
	```

	### 2. Python SDK

	```python
	import anthropic

	client = anthropic.Anthropic(
	api_key="any-key",
	base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
	)

	# Basic message
	message = client.messages.create(
	model="qwen2.5-coder-7b",
	max_tokens=1024,
	messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}]
	)
	print(message.content[0].text)

	# With system prompt
	message = client.messages.create(
	model="qwen2.5-coder-7b",
	max_tokens=1024,
	system="You are a helpful coding assistant. Always include comments in your code.",
	messages=[{"role": "user", "content": "Write a function to calculate factorial"}]
	)
	print(message.content[0].text)
	```

	### 3. Streaming Response

	```python
	import anthropic

	client = anthropic.Anthropic(
	api_key="any-key",
	base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
	)

	with client.messages.stream(
	model="qwen2.5-coder-7b",
	max_tokens=1024,
	messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}]
	) as stream:
	for text in stream.text_stream:
	print(text, end="", flush=True)
	```

	### 4. Tool Use / Function Calling

	```python
	import anthropic
	import json

	client = anthropic.Anthropic(
	api_key="any-key",
	base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
	)

	tools = [
	{
	"name": "get_weather",
	"description": "Get the current weather for a location",
	"input_schema": {
	"type": "object",
	"properties": {
	"location": {"type": "string", "description": "City name"},
	"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
	},
	"required": ["location"]
	}
	}
	]

	message = client.messages.create(
	model="qwen2.5-coder-7b",
	max_tokens=1024,
	tools=tools,
	messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
	)

	if message.stop_reason == "tool_use":
	for block in message.content:
	if block.type == "tool_use":
	print(f"Tool: {block.name}")
	print(f"Input: {json.dumps(block.input, indent=2)}")
	```

	### 5. Extended Thinking

	```python
	import anthropic

	client = anthropic.Anthropic(
	api_key="any-key",
	base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic"
	)

	message = client.messages.create(
	model="qwen2.5-coder-7b",
	max_tokens=2048,
	thinking={"type": "enabled", "budget_tokens": 1024},
	messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}]
	)

	for block in message.content:
	if block.type == "thinking":
	print("=== THINKING ===")
	print(block.thinking)
	elif block.type == "text":
	print("=== ANSWER ===")
	print(block.text)
	```

	### 6. TypeScript/JavaScript

	```typescript
	import Anthropic from '@anthropic-ai/sdk';

	const client = new Anthropic({
	apiKey: 'any-key',
	baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic'
	});

	const message = await client.messages.create({
	model: 'qwen2.5-coder-7b',
	max_tokens: 1024,
	messages: [{ role: 'user', content: 'Hello!' }]
	});

	console.log(message.content[0].text);
	```

	### 7. cURL

	```bash
	curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \
	-H "Content-Type: application/json" \
	-H "x-api-key: any-key" \
	-H "anthropic-version: 2023-06-01" \
	-d '{
	"model": "qwen2.5-coder-7b",
	"max_tokens": 256,
	"messages": [{"role": "user", "content": "Hello!"}]
	}'
	```

	### 8. OpenAI SDK (Alternative)

	```python
	from openai import OpenAI

	client = OpenAI(
	api_key="any-key",
	base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1"
	)

	response = client.chat.completions.create(
	model="qwen2.5-coder-7b",
	messages=[{"role": "user", "content": "Hello!"}],
	max_tokens=1024
	)
	print(response.choices[0].message.content)
	```

	---

	## API Reference

	### Endpoints

	\| Method \| Endpoint \| Description \|
	\|--------\|----------\|-------------\|
	\| `GET` \| `/` \| Dashboard with status & playground \|
	\| `GET` \| `/health` \| Health check with queue/cache stats \|
	\| `GET` \| `/logs?lines=100` \| View API logs \|
	\| `GET` \| `/queue/status` \| Request queue statistics \|
	\| `GET` \| `/models/status` \| Loaded models information \|
	\| `POST` \| `/models/{id}/load` \| Manually load a model \|
	\| `POST` \| `/models/{id}/unload` \| Unload a model \|
	\| `GET` \| `/anthropic/v1/models` \| List models (Anthropic format) \|
	\| `POST` \| `/anthropic/v1/messages` \| Create message (Anthropic API) \|
	\| `POST` \| `/anthropic/v1/messages/count_tokens` \| Count tokens \|
	\| `GET` \| `/v1/models` \| List models (OpenAI format) \|
	\| `POST` \| `/v1/chat/completions` \| Chat completion (OpenAI API) \|

	### Request Format

	```json
	{
	"model": "qwen2.5-coder-7b",
	"max_tokens": 1024,
	"messages": [{"role": "user", "content": "Hello!"}],
	"system": "You are a helpful assistant.",
	"temperature": 0.7,
	"stream": false,
	"tools": [...],
	"thinking": {"type": "enabled", "budget_tokens": 1024}
	}
	```

	### Response Format

	```json
	{
	"id": "msg_abc123",
	"type": "message",
	"role": "assistant",
	"content": [{"type": "text", "text": "Hello!"}],
	"model": "qwen2.5-coder-7b",
	"stop_reason": "end_turn",
	"usage": {"input_tokens": 10, "output_tokens": 25}
	}
	```

	---

	## Model Info

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model \| Qwen2.5-Coder-7B-Instruct \|
	\| Format \| GGUF (Q4_K_M quantization) \|
	\| Parameters \| 7 Billion \|
	\| Context Length \| 8,192 tokens \|
	\| Backend \| llama.cpp \|
	\| Optimized For \| Code, tool use, agent workflows \|

	---

	## Troubleshooting

	\| Issue \| Solution \|
	\|-------\|----------\|
	\| Connection Timeout \| Space may be sleeping. First request wakes it (~30s) \|
	\| 503 Queue Full \| Too many requests. Retry in a few seconds \|
	\| Slow Response \| CPU-based, expect ~10-30 tokens/second \|
	\| Tool Use Issues \| Ensure valid JSON schema \|

	---

	## License

	Apache 2.0 \| Built with llama.cpp + FastAPI by Matrix Agent