Spaces:
Sleeping
Sleeping
| title: Anthropic Compatible API | |
| emoji: 🤖 | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| license: apache-2.0 | |
| # Anthropic-Compatible API | |
| A **production-ready, self-hosted API** that provides full **Anthropic Messages API compatibility** using the Qwen2.5-Coder-7B model with llama.cpp backend. | |
| > **Live Dashboard**: [https://likhonsheikh-anthropic-compatible-api.hf.space](https://likhonsheikh-anthropic-compatible-api.hf.space) | |
| ## Features | |
| | Feature | Description | | |
| |---------|-------------| | |
| | **Full Anthropic API** | Complete Messages API compatibility | | |
| | **OpenAI API** | Dual compatibility with OpenAI Chat API | | |
| | **Streaming (SSE)** | Real-time token streaming | | |
| | **Tool Use** | Function calling / tool use support | | |
| | **Extended Thinking** | `<thinking>` block support for reasoning | | |
| | **Request Queue** | Concurrency control with priority | | |
| | **Prompt Caching** | LRU cache for system prompts | | |
| | **Multi-Model** | Hot-swap between models | | |
| | **Live Dashboard** | Built-in web UI with playground | | |
| | **Logs Viewer** | Real-time API logs | | |
| --- | |
| ## Quick Start | |
| ### 1. Claude Code CLI | |
| The easiest way to use this API with Claude Code: | |
| ```bash | |
| # Set environment variables | |
| export ANTHROPIC_API_KEY="any-key" | |
| export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" | |
| # Run Claude Code | |
| claude "Write a Python script that reads a CSV file" | |
| # Or with explicit model | |
| claude --model qwen2.5-coder-7b "Explain this code" | |
| ``` | |
| **Persistent Configuration** (add to `~/.bashrc` or `~/.zshrc`): | |
| ```bash | |
| # Anthropic-Compatible API Configuration | |
| export ANTHROPIC_API_KEY="any-key" | |
| export ANTHROPIC_BASE_URL="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" | |
| ``` | |
| ### 2. Python SDK | |
| ```python | |
| import anthropic | |
| client = anthropic.Anthropic( | |
| api_key="any-key", | |
| base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" | |
| ) | |
| # Basic message | |
| message = client.messages.create( | |
| model="qwen2.5-coder-7b", | |
| max_tokens=1024, | |
| messages=[{"role": "user", "content": "Hello! Write a hello world in Python."}] | |
| ) | |
| print(message.content[0].text) | |
| # With system prompt | |
| message = client.messages.create( | |
| model="qwen2.5-coder-7b", | |
| max_tokens=1024, | |
| system="You are a helpful coding assistant. Always include comments in your code.", | |
| messages=[{"role": "user", "content": "Write a function to calculate factorial"}] | |
| ) | |
| print(message.content[0].text) | |
| ``` | |
| ### 3. Streaming Response | |
| ```python | |
| import anthropic | |
| client = anthropic.Anthropic( | |
| api_key="any-key", | |
| base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" | |
| ) | |
| with client.messages.stream( | |
| model="qwen2.5-coder-7b", | |
| max_tokens=1024, | |
| messages=[{"role": "user", "content": "Write a detailed explanation of recursion"}] | |
| ) as stream: | |
| for text in stream.text_stream: | |
| print(text, end="", flush=True) | |
| ``` | |
| ### 4. Tool Use / Function Calling | |
| ```python | |
| import anthropic | |
| import json | |
| client = anthropic.Anthropic( | |
| api_key="any-key", | |
| base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" | |
| ) | |
| tools = [ | |
| { | |
| "name": "get_weather", | |
| "description": "Get the current weather for a location", | |
| "input_schema": { | |
| "type": "object", | |
| "properties": { | |
| "location": {"type": "string", "description": "City name"}, | |
| "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} | |
| }, | |
| "required": ["location"] | |
| } | |
| } | |
| ] | |
| message = client.messages.create( | |
| model="qwen2.5-coder-7b", | |
| max_tokens=1024, | |
| tools=tools, | |
| messages=[{"role": "user", "content": "What's the weather in Tokyo?"}] | |
| ) | |
| if message.stop_reason == "tool_use": | |
| for block in message.content: | |
| if block.type == "tool_use": | |
| print(f"Tool: {block.name}") | |
| print(f"Input: {json.dumps(block.input, indent=2)}") | |
| ``` | |
| ### 5. Extended Thinking | |
| ```python | |
| import anthropic | |
| client = anthropic.Anthropic( | |
| api_key="any-key", | |
| base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic" | |
| ) | |
| message = client.messages.create( | |
| model="qwen2.5-coder-7b", | |
| max_tokens=2048, | |
| thinking={"type": "enabled", "budget_tokens": 1024}, | |
| messages=[{"role": "user", "content": "Solve step by step: What is 15% of 240?"}] | |
| ) | |
| for block in message.content: | |
| if block.type == "thinking": | |
| print("=== THINKING ===") | |
| print(block.thinking) | |
| elif block.type == "text": | |
| print("=== ANSWER ===") | |
| print(block.text) | |
| ``` | |
| ### 6. TypeScript/JavaScript | |
| ```typescript | |
| import Anthropic from '@anthropic-ai/sdk'; | |
| const client = new Anthropic({ | |
| apiKey: 'any-key', | |
| baseURL: 'https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic' | |
| }); | |
| const message = await client.messages.create({ | |
| model: 'qwen2.5-coder-7b', | |
| max_tokens: 1024, | |
| messages: [{ role: 'user', content: 'Hello!' }] | |
| }); | |
| console.log(message.content[0].text); | |
| ``` | |
| ### 7. cURL | |
| ```bash | |
| curl -X POST "https://likhonsheikh-anthropic-compatible-api.hf.space/anthropic/v1/messages" \ | |
| -H "Content-Type: application/json" \ | |
| -H "x-api-key: any-key" \ | |
| -H "anthropic-version: 2023-06-01" \ | |
| -d '{ | |
| "model": "qwen2.5-coder-7b", | |
| "max_tokens": 256, | |
| "messages": [{"role": "user", "content": "Hello!"}] | |
| }' | |
| ``` | |
| ### 8. OpenAI SDK (Alternative) | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI( | |
| api_key="any-key", | |
| base_url="https://likhonsheikh-anthropic-compatible-api.hf.space/v1" | |
| ) | |
| response = client.chat.completions.create( | |
| model="qwen2.5-coder-7b", | |
| messages=[{"role": "user", "content": "Hello!"}], | |
| max_tokens=1024 | |
| ) | |
| print(response.choices[0].message.content) | |
| ``` | |
| --- | |
| ## API Reference | |
| ### Endpoints | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/` | Dashboard with status & playground | | |
| | `GET` | `/health` | Health check with queue/cache stats | | |
| | `GET` | `/logs?lines=100` | View API logs | | |
| | `GET` | `/queue/status` | Request queue statistics | | |
| | `GET` | `/models/status` | Loaded models information | | |
| | `POST` | `/models/{id}/load` | Manually load a model | | |
| | `POST` | `/models/{id}/unload` | Unload a model | | |
| | `GET` | `/anthropic/v1/models` | List models (Anthropic format) | | |
| | `POST` | `/anthropic/v1/messages` | Create message (Anthropic API) | | |
| | `POST` | `/anthropic/v1/messages/count_tokens` | Count tokens | | |
| | `GET` | `/v1/models` | List models (OpenAI format) | | |
| | `POST` | `/v1/chat/completions` | Chat completion (OpenAI API) | | |
| ### Request Format | |
| ```json | |
| { | |
| "model": "qwen2.5-coder-7b", | |
| "max_tokens": 1024, | |
| "messages": [{"role": "user", "content": "Hello!"}], | |
| "system": "You are a helpful assistant.", | |
| "temperature": 0.7, | |
| "stream": false, | |
| "tools": [...], | |
| "thinking": {"type": "enabled", "budget_tokens": 1024} | |
| } | |
| ``` | |
| ### Response Format | |
| ```json | |
| { | |
| "id": "msg_abc123", | |
| "type": "message", | |
| "role": "assistant", | |
| "content": [{"type": "text", "text": "Hello!"}], | |
| "model": "qwen2.5-coder-7b", | |
| "stop_reason": "end_turn", | |
| "usage": {"input_tokens": 10, "output_tokens": 25} | |
| } | |
| ``` | |
| --- | |
| ## Model Info | |
| | Property | Value | | |
| |----------|-------| | |
| | **Model** | Qwen2.5-Coder-7B-Instruct | | |
| | **Format** | GGUF (Q4_K_M quantization) | | |
| | **Parameters** | 7 Billion | | |
| | **Context Length** | 8,192 tokens | | |
| | **Backend** | llama.cpp | | |
| | **Optimized For** | Code, tool use, agent workflows | | |
| --- | |
| ## Troubleshooting | |
| | Issue | Solution | | |
| |-------|----------| | |
| | Connection Timeout | Space may be sleeping. First request wakes it (~30s) | | |
| | 503 Queue Full | Too many requests. Retry in a few seconds | | |
| | Slow Response | CPU-based, expect ~10-30 tokens/second | | |
| | Tool Use Issues | Ensure valid JSON schema | | |
| --- | |
| ## License | |
| Apache 2.0 | Built with llama.cpp + FastAPI by Matrix Agent | |