likhonsheikh's picture
Upload README.md with huggingface_hub
7989a2d verified
|
raw
history blame
2.19 kB
metadata
title: Anthropic Compatible API
emoji: πŸ€–
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0

Anthropic-Compatible API

A lightweight, CPU-based API endpoint that provides Anthropic Messages API compatibility using the SmolLM2-135M model.

Features

  • βœ… Full Anthropic Messages API compatibility
  • βœ… Streaming support (SSE)
  • βœ… Token counting endpoint
  • βœ… Ultra-lightweight (135M parameters)
  • βœ… CPU-optimized
  • βœ… No GPU required

API Endpoints

Create Message

POST /v1/messages

Example Request

curl -X POST "https://YOUR_SPACE.hf.space/v1/messages" \
  -H "Content-Type: application/json" \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "smollm2-135m",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Streaming Example

curl -X POST "https://YOUR_SPACE.hf.space/v1/messages" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smollm2-135m",
    "max_tokens": 256,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Tell me a short story"}
    ]
  }'

SDK Compatibility

Python

import anthropic

client = anthropic.Anthropic(
    api_key="any-key",
    base_url="https://YOUR_SPACE.hf.space"
)

message = client.messages.create(
    model="smollm2-135m",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello!"}]
)
print(message.content[0].text)

TypeScript/JavaScript

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'any-key',
  baseURL: 'https://YOUR_SPACE.hf.space'
});

const message = await client.messages.create({
  model: 'smollm2-135m',
  max_tokens: 256,
  messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(message.content[0].text);

Model Info

  • Model: HuggingFaceTB/SmolLM2-135M-Instruct
  • Parameters: 135 Million
  • Optimized for: CPU inference
  • Context Length: 2048 tokens

Rate Limits

This is a free CPU-based endpoint. Please be mindful of usage.


Built with ❀️ by Matrix Agent