--- title: Anthropic Compatible API emoji: 🤖 colorFrom: purple colorTo: blue sdk: docker pinned: false license: apache-2.0 --- # Anthropic-Compatible API A **production-ready, self-hosted API** that provides full **Anthropic Messages API compatibility** using the Qwen2.5-Coder-7B model with llama.cpp backend. > **Live Dashboard**: [https://likhonsheikh-anthropic-compatible-api.hf.space](https://likhonsheikh-anthropic-compatible-api.hf.space) ## Features | Feature | Description | |---------|-------------| | **Full Anthropic API** | Complete Messages API compatibility | | **OpenAI API** | Dual compatibility with OpenAI Chat API | | **Streaming (SSE)** | Real-time token streaming | | **Tool Use** | Function calling / tool use support | | **Extended Thinking** | `` block support for reasoning | | **Request Queue** | Concurrency control with priority | | **Prompt Caching** | LRU cache for system prompts | | **Multi-Model** | Hot-swap between models | | **Live Dashboard** | Built-in web UI with playground | | **Logs Viewer** | Real-time API logs | --- ## Quick Start ### 1. Claude Code CLI The easiest way to use this API with Claude Code: ```bash # Set environment variables export ANTHROPIC_API_KEY="any-key" export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" # Run Claude Code claude "Write a Python script that reads a CSV file" # Or with explicit model claude --model qwen2.5-coder-7b "Explain this code" ``` **Persistent Configuration** (add to `~/.bashrc` or `~/.zshrc`): ```bash # Anthropic-Compatible API Configuration export ANTHROPIC_API_KEY="any-key" export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" ``` ### 2. Python SDK ```python import anthropic client = anthropic.Anthropic( api_key="any-key", base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" ) # Basic message message = client.messages.create( model="qwen2.5-coder-7b", max_tokens=1024, messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}] ) print(message.content[0].text) # With system prompt message = client.messages.create( model="qwen2.5-coder-7b", max_tokens=1024, system="You are a helpful coding assistant. Always include comments in your code.", messages=[{"role": "user", "content": "Write a function to calculate factorial"}] ) print(message.content[0].text) ``` ### 3. Streaming Response ```python import anthropic client = anthropic.Anthropic( api_key="any-key", base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" ) with client.messages.stream( model="qwen2.5-coder-7b", max_tokens=1024, messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True) ``` ### 4. Tool Use / Function Calling ```python import anthropic import json client = anthropic.Anthropic( api_key="any-key", base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" ) tools = [ { "name": "get_weather", "description": "Get the current weather for a location", "input_schema": { "type": "object", "properties": { "location": {"type": "string", "description": "City name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } ] message = client.messages.create( model="qwen2.5-coder-7b", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": "What's the weather in Tokyo?"}] ) if message.stop_reason == "tool_use": for block in message.content: if block.type == "tool_use": print(f"Tool: {block.name}") print(f"Input: {json.dumps(block.input, indent=2)}") ``` ### 5. Extended Thinking ```python import anthropic client = anthropic.Anthropic( api_key="any-key", base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" ) message = client.messages.create( model="qwen2.5-coder-7b", max_tokens=2048, thinking={"type": "enabled", "budget_tokens": 1024}, messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}] ) for block in message.content: if block.type == "thinking": print("=== THINKING ===") print(block.thinking) elif block.type == "text": print("=== ANSWER ===") print(block.text) ``` ### 6. TypeScript/JavaScript ```typescript import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic({ apiKey: 'any-key', baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic' }); const message = await client.messages.create({ model: 'qwen2.5-coder-7b', max_tokens: 1024, messages: [{ role: 'user', content: 'Hello!' }] }); console.log(message.content[0].text); ``` ### 7. cURL ```bash curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: any-key" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "qwen2.5-coder-7b", "max_tokens": 256, "messages": [{"role": "user", "content": "Hello!"}] }' ``` ### 8. OpenAI SDK (Alternative) ```python from openai import OpenAI client = OpenAI( api_key="any-key", base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1" ) response = client.chat.completions.create( model="qwen2.5-coder-7b", messages=[{"role": "user", "content": "Hello!"}], max_tokens=1024 ) print(response.choices[0].message.content) ``` --- ## API Reference ### Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | `GET` | `/` | Dashboard with status & playground | | `GET` | `/health` | Health check with queue/cache stats | | `GET` | `/logs?lines=100` | View API logs | | `GET` | `/queue/status` | Request queue statistics | | `GET` | `/models/status` | Loaded models information | | `POST` | `/models/{id}/load` | Manually load a model | | `POST` | `/models/{id}/unload` | Unload a model | | `GET` | `/anthropic/v1/models` | List models (Anthropic format) | | `POST` | `/anthropic/v1/messages` | Create message (Anthropic API) | | `POST` | `/anthropic/v1/messages/count_tokens` | Count tokens | | `GET` | `/v1/models` | List models (OpenAI format) | | `POST` | `/v1/chat/completions` | Chat completion (OpenAI API) | ### Request Format ```json { "model": "qwen2.5-coder-7b", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}], "system": "You are a helpful assistant.", "temperature": 0.7, "stream": false, "tools": [...], "thinking": {"type": "enabled", "budget_tokens": 1024} } ``` ### Response Format ```json { "id": "msg_abc123", "type": "message", "role": "assistant", "content": [{"type": "text", "text": "Hello!"}], "model": "qwen2.5-coder-7b", "stop_reason": "end_turn", "usage": {"input_tokens": 10, "output_tokens": 25} } ``` --- ## Model Info | Property | Value | |----------|-------| | **Model** | Qwen2.5-Coder-7B-Instruct | | **Format** | GGUF (Q4_K_M quantization) | | **Parameters** | 7 Billion | | **Context Length** | 8,192 tokens | | **Backend** | llama.cpp | | **Optimized For** | Code, tool use, agent workflows | --- ## Troubleshooting | Issue | Solution | |-------|----------| | Connection Timeout | Space may be sleeping. First request wakes it (~30s) | | 503 Queue Full | Too many requests. Retry in a few seconds | | Slow Response | CPU-based, expect ~10-30 tokens/second | | Tool Use Issues | Ensure valid JSON schema | --- ## License Apache 2.0 | Built with llama.cpp + FastAPI by Matrix Agent