megharudushi commited on
Commit
4825115
·
verified ·
1 Parent(s): 3352264

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. Dockerfile +43 -0
  2. README.md +131 -5
  3. app.py +899 -0
  4. requirements.txt +15 -0
Dockerfile ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HuggingFace Spaces Dockerfile
2
+ # Optimized for free CPU tier (2 vCPU, 16GB RAM)
3
+
4
+ FROM python:3.10-slim
5
+
6
+ # Set working directory
7
+ WORKDIR /app
8
+
9
+ # Install system dependencies
10
+ RUN apt-get update && apt-get install -y \
11
+ build-essential \
12
+ git \
13
+ && rm -rf /var/lib/apt/lists/*
14
+
15
+ # Create non-root user for HF Spaces
16
+ RUN useradd -m -u 1000 user
17
+ USER user
18
+
19
+ # Set environment variables
20
+ ENV HOME=/home/user \
21
+ PATH=/home/user/.local/bin:$PATH \
22
+ PYTHONUNBUFFERED=1 \
23
+ TRANSFORMERS_CACHE=/home/user/.cache/huggingface
24
+
25
+ # Copy requirements first for better caching
26
+ COPY --chown=user:user requirements.txt .
27
+
28
+ # Install Python dependencies
29
+ RUN pip install --no-cache-dir --upgrade pip && \
30
+ pip install --no-cache-dir -r requirements.txt
31
+
32
+ # Copy application code
33
+ COPY --chown=user:user app.py .
34
+
35
+ # Expose port for HF Spaces
36
+ EXPOSE 7860
37
+
38
+ # Health check
39
+ HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 \
40
+ CMD curl -f http://localhost:7860/health || exit 1
41
+
42
+ # Run the application
43
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,136 @@
1
  ---
2
- title: Free Coding Api
3
- emoji: 🏆
4
- colorFrom: indigo
5
- colorTo: yellow
6
  sdk: docker
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Free Coding API
3
+ emoji: 🚀
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
8
+ license: mit
9
  ---
10
 
11
+ # 🚀 Free Coding API
12
+
13
+ **OpenAI & Anthropic Compatible API for Coding Tasks**
14
+
15
+ Built with skills, not money! This is a free, open-source API endpoint that runs on HuggingFace Spaces, providing coding assistance similar to OpenAI Codex and Claude Code.
16
+
17
+ ## Features
18
+
19
+ - ✅ **OpenAI API Compatible** (`/v1/chat/completions`)
20
+ - ✅ **Anthropic API Compatible** (`/v1/messages`)
21
+ - ✅ **Streaming Support** (SSE)
22
+ - ✅ **Coding Optimized** (Qwen2.5-Coder model)
23
+ - ✅ **100% Free** (Runs on HF Spaces free tier)
24
+
25
+ ## Quick Start
26
+
27
+ ### Using OpenAI SDK
28
+
29
+ ```python
30
+ from openai import OpenAI
31
+
32
+ client = OpenAI(
33
+ base_url="https://YOUR-SPACE.hf.space/v1",
34
+ api_key="sk-free-coding-api"
35
+ )
36
+
37
+ response = client.chat.completions.create(
38
+ model="gpt-4", # Mapped to Qwen2.5-Coder
39
+ messages=[
40
+ {"role": "system", "content": "You are an expert Python developer."},
41
+ {"role": "user", "content": "Write a function to find prime numbers"}
42
+ ],
43
+ stream=True
44
+ )
45
+
46
+ for chunk in response:
47
+ if chunk.choices[0].delta.content:
48
+ print(chunk.choices[0].delta.content, end="")
49
+ ```
50
+
51
+ ### Using Anthropic SDK
52
+
53
+ ```python
54
+ import anthropic
55
+
56
+ client = anthropic.Anthropic(
57
+ base_url="https://YOUR-SPACE.hf.space",
58
+ api_key="sk-free-coding-api"
59
+ )
60
+
61
+ response = client.messages.create(
62
+ model="claude-3-sonnet", # Mapped to Qwen2.5-Coder
63
+ max_tokens=1024,
64
+ messages=[
65
+ {"role": "user", "content": "Write a REST API in FastAPI"}
66
+ ]
67
+ )
68
+
69
+ print(response.content[0].text)
70
+ ```
71
+
72
+ ### Using cURL
73
+
74
+ ```bash
75
+ # OpenAI format
76
+ curl -X POST https://YOUR-SPACE.hf.space/v1/chat/completions \
77
+ -H "Content-Type: application/json" \
78
+ -H "Authorization: Bearer sk-free-coding-api" \
79
+ -d '{
80
+ "model": "gpt-4",
81
+ "messages": [{"role": "user", "content": "Hello, write Python code"}]
82
+ }'
83
+
84
+ # Anthropic format
85
+ curl -X POST https://YOUR-SPACE.hf.space/v1/messages \
86
+ -H "Content-Type: application/json" \
87
+ -H "x-api-key: sk-free-coding-api" \
88
+ -d '{
89
+ "model": "claude-3-sonnet",
90
+ "max_tokens": 1024,
91
+ "messages": [{"role": "user", "content": "Hello, write Python code"}]
92
+ }'
93
+ ```
94
+
95
+ ## API Endpoints
96
+
97
+ | Endpoint | Method | Description |
98
+ |----------|--------|-------------|
99
+ | `/v1/chat/completions` | POST | OpenAI-compatible chat |
100
+ | `/v1/messages` | POST | Anthropic-compatible messages |
101
+ | `/v1/models` | GET | List available models |
102
+ | `/health` | GET | Health check |
103
+ | `/docs` | GET | Swagger UI documentation |
104
+
105
+ ## Supported Models
106
+
107
+ All model names are aliases mapped to `Qwen2.5-Coder-1.5B-Instruct`:
108
+
109
+ **OpenAI aliases:** `gpt-4`, `gpt-4-turbo`, `gpt-3.5-turbo`, `codex`, `code-davinci-002`
110
+
111
+ **Anthropic aliases:** `claude-3-opus`, `claude-3-sonnet`, `claude-3-haiku`, `claude-3-5-sonnet`, `claude-code`
112
+
113
+ ## Environment Variables
114
+
115
+ | Variable | Default | Description |
116
+ |----------|---------|-------------|
117
+ | `MODEL_ID` | `Qwen/Qwen2.5-Coder-1.5B-Instruct` | HuggingFace model to use |
118
+ | `API_KEY` | `sk-free-coding-api` | API key for authentication |
119
+
120
+ ## Limitations
121
+
122
+ Running on HF Spaces free tier:
123
+ - **CPU only** (2 vCPU, 16GB RAM)
124
+ - **Response time**: 10-30 seconds for typical requests
125
+ - **Max context**: ~4K tokens
126
+ - **Best for**: Code generation, debugging, explanations
127
+
128
+ ## Deploy Your Own
129
+
130
+ 1. Fork this Space
131
+ 2. (Optional) Set environment variables in Space Settings
132
+ 3. Your API is ready at `https://YOUR-USERNAME-YOUR-SPACE.hf.space`
133
+
134
+ ## License
135
+
136
+ MIT License - Build with skills, not money! 🚀
app.py ADDED
@@ -0,0 +1,899 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HuggingFace Spaces - OpenAI & Anthropic Compatible Coding API
3
+ A free, skills-only API endpoint for coding tasks (like Codex/Claude Code)
4
+ Author: Matrix Agent
5
+
6
+ Features:
7
+ - Full OpenAI API compatibility (/v1/chat/completions)
8
+ - Full Anthropic API compatibility (/v1/messages)
9
+ - Optimized for coding tasks
10
+ - Runs on free HF Spaces (2 vCPU, 16GB RAM)
11
+
12
+ API Specifications verified against:
13
+ - OpenAI: https://platform.openai.com/docs/api-reference/chat/create
14
+ - Anthropic: https://docs.anthropic.com/en/api/messages
15
+ """
16
+
17
+ import os
18
+ import time
19
+ import uuid
20
+ import json
21
+ import asyncio
22
+ from typing import List, Optional, Union, Dict, Any, AsyncGenerator
23
+ from contextlib import asynccontextmanager
24
+
25
+ import torch
26
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
27
+ from threading import Thread
28
+
29
+ from fastapi import FastAPI, HTTPException, Header, Request, Response
30
+ from fastapi.middleware.cors import CORSMiddleware
31
+ from fastapi.responses import StreamingResponse, JSONResponse
32
+ from pydantic import BaseModel, Field
33
+
34
+ # ============================================================================
35
+ # Configuration
36
+ # ============================================================================
37
+
38
+ MODEL_ID = os.getenv("MODEL_ID", "Qwen/Qwen2.5-Coder-1.5B-Instruct")
39
+ ANTHROPIC_VERSION = "2023-06-01" # Standard Anthropic API version
40
+
41
+ MODEL_ALIASES = {
42
+ # OpenAI-style model names -> actual model
43
+ "gpt-4": MODEL_ID,
44
+ "gpt-4-turbo": MODEL_ID,
45
+ "gpt-4o": MODEL_ID,
46
+ "gpt-4o-mini": MODEL_ID,
47
+ "gpt-3.5-turbo": MODEL_ID,
48
+ "codex": MODEL_ID,
49
+ "code-davinci-002": MODEL_ID,
50
+ "o1": MODEL_ID,
51
+ "o1-mini": MODEL_ID,
52
+ # Anthropic-style model names
53
+ "claude-3-opus-20240229": MODEL_ID,
54
+ "claude-3-sonnet-20240229": MODEL_ID,
55
+ "claude-3-haiku-20240307": MODEL_ID,
56
+ "claude-3-5-sonnet-20241022": MODEL_ID,
57
+ "claude-3-5-haiku-20241022": MODEL_ID,
58
+ "claude-3-opus": MODEL_ID,
59
+ "claude-3-sonnet": MODEL_ID,
60
+ "claude-3-haiku": MODEL_ID,
61
+ "claude-3-5-sonnet": MODEL_ID,
62
+ "claude-code": MODEL_ID,
63
+ }
64
+
65
+ API_KEY = os.getenv("API_KEY", "sk-free-coding-api")
66
+ MAX_TOKENS_DEFAULT = 2048
67
+ TEMPERATURE_DEFAULT = 0.7
68
+
69
+ # ============================================================================
70
+ # Global Model Instance
71
+ # ============================================================================
72
+
73
+ model = None
74
+ tokenizer = None
75
+
76
+ def load_model():
77
+ """Load model with CPU optimization"""
78
+ global model, tokenizer
79
+
80
+ print(f"🚀 Loading model: {MODEL_ID}")
81
+ print(f"📊 Device: CPU (Free HF Spaces)")
82
+
83
+ tokenizer = AutoTokenizer.from_pretrained(
84
+ MODEL_ID,
85
+ trust_remote_code=True,
86
+ padding_side="left"
87
+ )
88
+
89
+ if tokenizer.pad_token is None:
90
+ tokenizer.pad_token = tokenizer.eos_token
91
+
92
+ # Load with CPU optimizations for 16GB RAM
93
+ model = AutoModelForCausalLM.from_pretrained(
94
+ MODEL_ID,
95
+ torch_dtype=torch.float32,
96
+ device_map="cpu",
97
+ trust_remote_code=True,
98
+ low_cpu_mem_usage=True,
99
+ )
100
+
101
+ model.eval()
102
+ print("✅ Model loaded successfully!")
103
+ return model, tokenizer
104
+
105
+ # ============================================================================
106
+ # Pydantic Models - OpenAI Compatible (Full Spec)
107
+ # ============================================================================
108
+
109
+ class OpenAIContentPart(BaseModel):
110
+ """Content part for multimodal messages"""
111
+ type: str # "text", "image_url"
112
+ text: Optional[str] = None
113
+ image_url: Optional[Dict[str, str]] = None
114
+
115
+ class OpenAIMessage(BaseModel):
116
+ """OpenAI message format - supports both string and array content"""
117
+ role: str # "system", "user", "assistant", "tool"
118
+ content: Optional[Union[str, List[OpenAIContentPart]]] = None
119
+ name: Optional[str] = None
120
+ tool_calls: Optional[List[Dict]] = None
121
+ tool_call_id: Optional[str] = None
122
+
123
+ class OpenAIResponseFormat(BaseModel):
124
+ """Response format specification"""
125
+ type: str = "text" # "text", "json_object", "json_schema"
126
+ json_schema: Optional[Dict] = None
127
+
128
+ class OpenAIChatRequest(BaseModel):
129
+ """Full OpenAI Chat Completions request spec"""
130
+ model: str
131
+ messages: List[OpenAIMessage]
132
+ # Generation parameters
133
+ temperature: Optional[float] = Field(default=1.0, ge=0, le=2)
134
+ top_p: Optional[float] = Field(default=1.0, ge=0, le=1)
135
+ n: Optional[int] = Field(default=1, ge=1, le=10)
136
+ stream: Optional[bool] = False
137
+ stop: Optional[Union[str, List[str]]] = None
138
+ max_tokens: Optional[int] = None
139
+ max_completion_tokens: Optional[int] = None # Newer parameter
140
+ presence_penalty: Optional[float] = Field(default=0, ge=-2, le=2)
141
+ frequency_penalty: Optional[float] = Field(default=0, ge=-2, le=2)
142
+ logit_bias: Optional[Dict[str, float]] = None
143
+ logprobs: Optional[bool] = False
144
+ top_logprobs: Optional[int] = None
145
+ # Additional parameters
146
+ user: Optional[str] = None
147
+ seed: Optional[int] = None
148
+ tools: Optional[List[Dict]] = None
149
+ tool_choice: Optional[Union[str, Dict]] = None
150
+ response_format: Optional[OpenAIResponseFormat] = None
151
+ # Stream options
152
+ stream_options: Optional[Dict] = None
153
+
154
+ class OpenAIChoiceMessage(BaseModel):
155
+ role: str = "assistant"
156
+ content: Optional[str] = None
157
+ tool_calls: Optional[List[Dict]] = None
158
+
159
+ class OpenAIChoice(BaseModel):
160
+ index: int
161
+ message: OpenAIChoiceMessage
162
+ finish_reason: Optional[str] = None # "stop", "length", "tool_calls", "content_filter"
163
+ logprobs: Optional[Dict] = None
164
+
165
+ class OpenAIStreamChoice(BaseModel):
166
+ index: int
167
+ delta: Dict
168
+ finish_reason: Optional[str] = None
169
+ logprobs: Optional[Dict] = None
170
+
171
+ class OpenAIUsage(BaseModel):
172
+ prompt_tokens: int
173
+ completion_tokens: int
174
+ total_tokens: int
175
+ prompt_tokens_details: Optional[Dict] = None
176
+ completion_tokens_details: Optional[Dict] = None
177
+
178
+ class OpenAIChatResponse(BaseModel):
179
+ """Full OpenAI Chat Completions response spec"""
180
+ id: str
181
+ object: str = "chat.completion"
182
+ created: int
183
+ model: str
184
+ choices: List[OpenAIChoice]
185
+ usage: Optional[OpenAIUsage] = None
186
+ system_fingerprint: Optional[str] = None
187
+ service_tier: Optional[str] = None
188
+
189
+ class OpenAIStreamResponse(BaseModel):
190
+ id: str
191
+ object: str = "chat.completion.chunk"
192
+ created: int
193
+ model: str
194
+ choices: List[OpenAIStreamChoice]
195
+ system_fingerprint: Optional[str] = None
196
+
197
+ class OpenAIModelInfo(BaseModel):
198
+ id: str
199
+ object: str = "model"
200
+ created: int
201
+ owned_by: str = "hf-spaces"
202
+
203
+ class OpenAIModelsResponse(BaseModel):
204
+ object: str = "list"
205
+ data: List[OpenAIModelInfo]
206
+
207
+ # ============================================================================
208
+ # Pydantic Models - Anthropic Compatible (Full Spec)
209
+ # ============================================================================
210
+
211
+ class AnthropicTextBlock(BaseModel):
212
+ """Text content block"""
213
+ type: str = "text"
214
+ text: str
215
+
216
+ class AnthropicImageSource(BaseModel):
217
+ """Image source for vision"""
218
+ type: str = "base64"
219
+ media_type: str # "image/jpeg", "image/png", "image/webp", "image/gif"
220
+ data: str
221
+
222
+ class AnthropicImageBlock(BaseModel):
223
+ """Image content block"""
224
+ type: str = "image"
225
+ source: AnthropicImageSource
226
+
227
+ class AnthropicToolUseBlock(BaseModel):
228
+ """Tool use content block"""
229
+ type: str = "tool_use"
230
+ id: str
231
+ name: str
232
+ input: Dict
233
+
234
+ class AnthropicToolResultBlock(BaseModel):
235
+ """Tool result content block"""
236
+ type: str = "tool_result"
237
+ tool_use_id: str
238
+ content: Union[str, List[Dict]]
239
+
240
+ # Union type for all content blocks
241
+ AnthropicContentBlock = Union[AnthropicTextBlock, AnthropicImageBlock, Dict]
242
+
243
+ class AnthropicMessage(BaseModel):
244
+ """Anthropic message format"""
245
+ role: str # "user", "assistant"
246
+ content: Union[str, List[AnthropicContentBlock]]
247
+
248
+ class AnthropicTool(BaseModel):
249
+ """Tool definition"""
250
+ name: str
251
+ description: Optional[str] = None
252
+ input_schema: Dict
253
+
254
+ class AnthropicToolChoice(BaseModel):
255
+ """Tool choice specification"""
256
+ type: str # "auto", "any", "tool"
257
+ name: Optional[str] = None
258
+
259
+ class AnthropicRequest(BaseModel):
260
+ """Full Anthropic Messages API request spec"""
261
+ model: str
262
+ messages: List[AnthropicMessage]
263
+ max_tokens: int # Required in Anthropic API
264
+ # Optional parameters
265
+ system: Optional[Union[str, List[Dict]]] = None
266
+ temperature: Optional[float] = Field(default=1.0, ge=0, le=1)
267
+ top_p: Optional[float] = Field(default=0.999, ge=0, le=1)
268
+ top_k: Optional[int] = None
269
+ stream: Optional[bool] = False
270
+ stop_sequences: Optional[List[str]] = None
271
+ # Tool use
272
+ tools: Optional[List[AnthropicTool]] = None
273
+ tool_choice: Optional[AnthropicToolChoice] = None
274
+ # Metadata
275
+ metadata: Optional[Dict] = None
276
+
277
+ class AnthropicResponseContent(BaseModel):
278
+ type: str = "text"
279
+ text: Optional[str] = None
280
+ # For tool_use
281
+ id: Optional[str] = None
282
+ name: Optional[str] = None
283
+ input: Optional[Dict] = None
284
+
285
+ class AnthropicUsage(BaseModel):
286
+ input_tokens: int
287
+ output_tokens: int
288
+
289
+ class AnthropicResponse(BaseModel):
290
+ """Full Anthropic Messages API response spec"""
291
+ id: str
292
+ type: str = "message"
293
+ role: str = "assistant"
294
+ model: str
295
+ content: List[AnthropicResponseContent]
296
+ stop_reason: Optional[str] = None # "end_turn", "max_tokens", "stop_sequence", "tool_use"
297
+ stop_sequence: Optional[str] = None
298
+ usage: AnthropicUsage
299
+
300
+ # ============================================================================
301
+ # Content Parsing Utilities
302
+ # ============================================================================
303
+
304
+ def extract_text_from_openai_content(content: Union[str, List, None]) -> str:
305
+ """Extract text from OpenAI message content (string or array)"""
306
+ if content is None:
307
+ return ""
308
+ if isinstance(content, str):
309
+ return content
310
+ if isinstance(content, list):
311
+ text_parts = []
312
+ for part in content:
313
+ if isinstance(part, dict):
314
+ if part.get("type") == "text":
315
+ text_parts.append(part.get("text", ""))
316
+ elif hasattr(part, "type") and part.type == "text":
317
+ text_parts.append(part.text or "")
318
+ return "\n".join(text_parts)
319
+ return str(content)
320
+
321
+ def extract_text_from_anthropic_content(content: Union[str, List]) -> str:
322
+ """Extract text from Anthropic message content (string or array)"""
323
+ if isinstance(content, str):
324
+ return content
325
+ if isinstance(content, list):
326
+ text_parts = []
327
+ for block in content:
328
+ if isinstance(block, dict):
329
+ if block.get("type") == "text":
330
+ text_parts.append(block.get("text", ""))
331
+ elif hasattr(block, "type") and block.type == "text":
332
+ text_parts.append(block.text or "")
333
+ return "\n".join(text_parts)
334
+ return str(content)
335
+
336
+ def extract_system_prompt_anthropic(system: Union[str, List[Dict], None]) -> str:
337
+ """Extract system prompt from Anthropic format"""
338
+ if system is None:
339
+ return ""
340
+ if isinstance(system, str):
341
+ return system
342
+ if isinstance(system, list):
343
+ # System can be array of text blocks
344
+ text_parts = []
345
+ for block in system:
346
+ if isinstance(block, dict) and block.get("type") == "text":
347
+ text_parts.append(block.get("text", ""))
348
+ return "\n".join(text_parts)
349
+ return ""
350
+
351
+ # ============================================================================
352
+ # Message Formatting
353
+ # ============================================================================
354
+
355
+ def format_messages_for_model(
356
+ messages: List[Dict],
357
+ system_prompt: Optional[str] = None
358
+ ) -> str:
359
+ """Format messages for the model using chat template"""
360
+ formatted_messages = []
361
+
362
+ if system_prompt:
363
+ formatted_messages.append({"role": "system", "content": system_prompt})
364
+
365
+ for msg in messages:
366
+ role = msg.get("role", "user")
367
+ content = msg.get("content", "")
368
+
369
+ # Map tool role to assistant for compatibility
370
+ if role == "tool":
371
+ role = "user"
372
+
373
+ formatted_messages.append({"role": role, "content": content})
374
+
375
+ # Use tokenizer's chat template if available
376
+ if hasattr(tokenizer, 'apply_chat_template') and tokenizer.chat_template:
377
+ try:
378
+ return tokenizer.apply_chat_template(
379
+ formatted_messages,
380
+ tokenize=False,
381
+ add_generation_prompt=True
382
+ )
383
+ except Exception:
384
+ pass
385
+
386
+ # Fallback: Simple format
387
+ prompt = ""
388
+ for msg in formatted_messages:
389
+ role = msg["role"]
390
+ content = msg["content"]
391
+ if role == "system":
392
+ prompt += f"<|system|>\n{content}\n"
393
+ elif role == "user":
394
+ prompt += f"<|user|>\n{content}\n"
395
+ elif role == "assistant":
396
+ prompt += f"<|assistant|>\n{content}\n"
397
+ prompt += "<|assistant|>\n"
398
+ return prompt
399
+
400
+ # ============================================================================
401
+ # Generation Logic
402
+ # ============================================================================
403
+
404
+ def generate_response(
405
+ prompt: str,
406
+ max_tokens: int = MAX_TOKENS_DEFAULT,
407
+ temperature: float = TEMPERATURE_DEFAULT,
408
+ top_p: float = 0.95,
409
+ top_k: Optional[int] = None,
410
+ stop: Optional[List[str]] = None,
411
+ ) -> tuple[str, int, int, str]:
412
+ """
413
+ Generate response from the model
414
+ Returns: (response_text, input_tokens, output_tokens, stop_reason)
415
+ """
416
+ inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
417
+ input_length = inputs.input_ids.shape[1]
418
+
419
+ # Generation config
420
+ gen_kwargs = {
421
+ "max_new_tokens": max_tokens,
422
+ "temperature": max(temperature, 0.01),
423
+ "top_p": top_p,
424
+ "do_sample": temperature > 0,
425
+ "pad_token_id": tokenizer.pad_token_id,
426
+ "eos_token_id": tokenizer.eos_token_id,
427
+ }
428
+
429
+ if top_k is not None and top_k > 0:
430
+ gen_kwargs["top_k"] = top_k
431
+
432
+ with torch.no_grad():
433
+ outputs = model.generate(inputs.input_ids, **gen_kwargs)
434
+
435
+ # Decode only the new tokens
436
+ generated_tokens = outputs[0][input_length:]
437
+ response_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
438
+
439
+ output_length = len(generated_tokens)
440
+ stop_reason = "stop" # Default
441
+
442
+ # Handle stop sequences
443
+ if stop:
444
+ for stop_seq in stop:
445
+ if stop_seq in response_text:
446
+ response_text = response_text.split(stop_seq)[0]
447
+ stop_reason = "stop"
448
+ break
449
+
450
+ # Check if max tokens reached
451
+ if output_length >= max_tokens:
452
+ stop_reason = "length"
453
+
454
+ return response_text.strip(), input_length, output_length, stop_reason
455
+
456
+ async def generate_stream(
457
+ prompt: str,
458
+ max_tokens: int = MAX_TOKENS_DEFAULT,
459
+ temperature: float = TEMPERATURE_DEFAULT,
460
+ top_p: float = 0.95,
461
+ top_k: Optional[int] = None,
462
+ ) -> AsyncGenerator[str, None]:
463
+ """Stream generation for real-time responses"""
464
+ inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
465
+
466
+ streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True, skip_prompt=True)
467
+
468
+ gen_kwargs = {
469
+ "max_new_tokens": max_tokens,
470
+ "temperature": max(temperature, 0.01),
471
+ "top_p": top_p,
472
+ "do_sample": temperature > 0,
473
+ "pad_token_id": tokenizer.pad_token_id,
474
+ "eos_token_id": tokenizer.eos_token_id,
475
+ "streamer": streamer,
476
+ }
477
+
478
+ if top_k is not None and top_k > 0:
479
+ gen_kwargs["top_k"] = top_k
480
+
481
+ thread = Thread(target=lambda: model.generate(inputs.input_ids, **gen_kwargs))
482
+ thread.start()
483
+
484
+ for text in streamer:
485
+ yield text
486
+
487
+ thread.join()
488
+
489
+ # ============================================================================
490
+ # FastAPI Application
491
+ # ============================================================================
492
+
493
+ @asynccontextmanager
494
+ async def lifespan(app: FastAPI):
495
+ """Load model on startup"""
496
+ load_model()
497
+ yield
498
+
499
+ app = FastAPI(
500
+ title="Free Coding API",
501
+ description="OpenAI & Anthropic compatible API for coding tasks",
502
+ version="1.0.0",
503
+ lifespan=lifespan
504
+ )
505
+
506
+ app.add_middleware(
507
+ CORSMiddleware,
508
+ allow_origins=["*"],
509
+ allow_credentials=True,
510
+ allow_methods=["*"],
511
+ allow_headers=["*"],
512
+ )
513
+
514
+ # ============================================================================
515
+ # Authentication
516
+ # ============================================================================
517
+
518
+ def verify_api_key(authorization: Optional[str] = None) -> bool:
519
+ """Simple API key verification"""
520
+ if not API_KEY or API_KEY == "":
521
+ return True
522
+
523
+ if not authorization:
524
+ return False
525
+
526
+ if authorization.startswith("Bearer "):
527
+ token = authorization[7:]
528
+ else:
529
+ token = authorization
530
+
531
+ return token == API_KEY
532
+
533
+ # ============================================================================
534
+ # OpenAI Compatible Endpoints
535
+ # ============================================================================
536
+
537
+ @app.get("/v1/models")
538
+ async def list_models():
539
+ """List available models (OpenAI compatible)"""
540
+ models = [
541
+ OpenAIModelInfo(id=alias, created=int(time.time()))
542
+ for alias in MODEL_ALIASES.keys()
543
+ ]
544
+ return OpenAIModelsResponse(data=models)
545
+
546
+ @app.get("/v1/models/{model_id}")
547
+ async def get_model(model_id: str):
548
+ """Get model info"""
549
+ if model_id in MODEL_ALIASES or model_id == MODEL_ID:
550
+ return OpenAIModelInfo(id=model_id, created=int(time.time()))
551
+ raise HTTPException(status_code=404, detail="Model not found")
552
+
553
+ @app.post("/v1/chat/completions")
554
+ async def openai_chat_completions(
555
+ request: OpenAIChatRequest,
556
+ authorization: Optional[str] = Header(None),
557
+ ):
558
+ """OpenAI-compatible chat completions endpoint - Full spec compliance"""
559
+
560
+ if not verify_api_key(authorization):
561
+ raise HTTPException(status_code=401, detail="Invalid API key")
562
+
563
+ # Extract messages
564
+ messages = []
565
+ for m in request.messages:
566
+ content = extract_text_from_openai_content(m.content)
567
+ messages.append({"role": m.role, "content": content})
568
+
569
+ # Extract system message if present
570
+ system_prompt = None
571
+ filtered_messages = []
572
+ for msg in messages:
573
+ if msg["role"] == "system":
574
+ system_prompt = msg["content"]
575
+ else:
576
+ filtered_messages.append(msg)
577
+
578
+ prompt = format_messages_for_model(filtered_messages, system_prompt=system_prompt)
579
+
580
+ # Determine max tokens
581
+ max_tokens = request.max_completion_tokens or request.max_tokens or MAX_TOKENS_DEFAULT
582
+
583
+ # Handle stop sequences
584
+ stop_sequences = None
585
+ if request.stop:
586
+ stop_sequences = [request.stop] if isinstance(request.stop, str) else request.stop
587
+
588
+ request_id = f"chatcmpl-{uuid.uuid4().hex[:29]}"
589
+ system_fingerprint = f"fp_{uuid.uuid4().hex[:10]}"
590
+ created_time = int(time.time())
591
+
592
+ if request.stream:
593
+ # OpenAI Streaming format
594
+ async def stream_generator():
595
+ # First chunk with role
596
+ first_chunk = {
597
+ "id": request_id,
598
+ "object": "chat.completion.chunk",
599
+ "created": created_time,
600
+ "model": request.model,
601
+ "system_fingerprint": system_fingerprint,
602
+ "choices": [{
603
+ "index": 0,
604
+ "delta": {"role": "assistant", "content": ""},
605
+ "logprobs": None,
606
+ "finish_reason": None
607
+ }]
608
+ }
609
+ yield f"data: {json.dumps(first_chunk)}\n\n"
610
+
611
+ # Stream content
612
+ async for token in generate_stream(
613
+ prompt,
614
+ max_tokens=max_tokens,
615
+ temperature=request.temperature or 1.0,
616
+ top_p=request.top_p or 1.0,
617
+ ):
618
+ chunk = {
619
+ "id": request_id,
620
+ "object": "chat.completion.chunk",
621
+ "created": created_time,
622
+ "model": request.model,
623
+ "system_fingerprint": system_fingerprint,
624
+ "choices": [{
625
+ "index": 0,
626
+ "delta": {"content": token},
627
+ "logprobs": None,
628
+ "finish_reason": None
629
+ }]
630
+ }
631
+ yield f"data: {json.dumps(chunk)}\n\n"
632
+
633
+ # Final chunk with finish_reason
634
+ final_chunk = {
635
+ "id": request_id,
636
+ "object": "chat.completion.chunk",
637
+ "created": created_time,
638
+ "model": request.model,
639
+ "system_fingerprint": system_fingerprint,
640
+ "choices": [{
641
+ "index": 0,
642
+ "delta": {},
643
+ "logprobs": None,
644
+ "finish_reason": "stop"
645
+ }]
646
+ }
647
+ yield f"data: {json.dumps(final_chunk)}\n\n"
648
+
649
+ # Usage chunk if requested
650
+ if request.stream_options and request.stream_options.get("include_usage"):
651
+ usage_chunk = {
652
+ "id": request_id,
653
+ "object": "chat.completion.chunk",
654
+ "created": created_time,
655
+ "model": request.model,
656
+ "choices": [],
657
+ "usage": {
658
+ "prompt_tokens": 0,
659
+ "completion_tokens": 0,
660
+ "total_tokens": 0
661
+ }
662
+ }
663
+ yield f"data: {json.dumps(usage_chunk)}\n\n"
664
+
665
+ yield "data: [DONE]\n\n"
666
+
667
+ return StreamingResponse(
668
+ stream_generator(),
669
+ media_type="text/event-stream",
670
+ headers={
671
+ "Cache-Control": "no-cache",
672
+ "Connection": "keep-alive",
673
+ "X-Accel-Buffering": "no"
674
+ }
675
+ )
676
+
677
+ # Non-streaming response
678
+ response_text, input_tokens, output_tokens, stop_reason = generate_response(
679
+ prompt,
680
+ max_tokens=max_tokens,
681
+ temperature=request.temperature or 1.0,
682
+ top_p=request.top_p or 1.0,
683
+ stop=stop_sequences,
684
+ )
685
+
686
+ # Map stop reason to OpenAI format
687
+ openai_finish_reason = "stop" if stop_reason == "stop" else "length"
688
+
689
+ return OpenAIChatResponse(
690
+ id=request_id,
691
+ created=created_time,
692
+ model=request.model,
693
+ system_fingerprint=system_fingerprint,
694
+ choices=[
695
+ OpenAIChoice(
696
+ index=0,
697
+ message=OpenAIChoiceMessage(role="assistant", content=response_text),
698
+ finish_reason=openai_finish_reason,
699
+ logprobs=None
700
+ )
701
+ ],
702
+ usage=OpenAIUsage(
703
+ prompt_tokens=input_tokens,
704
+ completion_tokens=output_tokens,
705
+ total_tokens=input_tokens + output_tokens
706
+ )
707
+ )
708
+
709
+ # ============================================================================
710
+ # Anthropic Compatible Endpoints
711
+ # ============================================================================
712
+
713
+ @app.post("/v1/messages")
714
+ async def anthropic_messages(
715
+ request: AnthropicRequest,
716
+ authorization: Optional[str] = Header(None),
717
+ x_api_key: Optional[str] = Header(None, alias="x-api-key"),
718
+ anthropic_version: Optional[str] = Header(None, alias="anthropic-version"),
719
+ ):
720
+ """Anthropic-compatible messages endpoint - Full spec compliance"""
721
+
722
+ # Anthropic uses x-api-key header
723
+ auth_key = x_api_key or authorization
724
+ if not verify_api_key(auth_key):
725
+ raise HTTPException(status_code=401, detail="Invalid API key")
726
+
727
+ # Extract messages
728
+ messages = []
729
+ for m in request.messages:
730
+ content = extract_text_from_anthropic_content(m.content)
731
+ messages.append({"role": m.role, "content": content})
732
+
733
+ # Extract system prompt
734
+ system_prompt = extract_system_prompt_anthropic(request.system)
735
+
736
+ prompt = format_messages_for_model(messages, system_prompt=system_prompt)
737
+
738
+ request_id = f"msg_{uuid.uuid4().hex[:24]}"
739
+
740
+ if request.stream:
741
+ # Anthropic streaming format (Server-Sent Events)
742
+ async def stream_generator():
743
+ input_tokens = 0 # Would be calculated from prompt
744
+
745
+ # 1. message_start event
746
+ message_start = {
747
+ "type": "message_start",
748
+ "message": {
749
+ "id": request_id,
750
+ "type": "message",
751
+ "role": "assistant",
752
+ "model": request.model,
753
+ "content": [],
754
+ "stop_reason": None,
755
+ "stop_sequence": None,
756
+ "usage": {
757
+ "input_tokens": input_tokens,
758
+ "output_tokens": 0
759
+ }
760
+ }
761
+ }
762
+ yield f"event: message_start\ndata: {json.dumps(message_start)}\n\n"
763
+
764
+ # 2. content_block_start event
765
+ content_block_start = {
766
+ "type": "content_block_start",
767
+ "index": 0,
768
+ "content_block": {
769
+ "type": "text",
770
+ "text": ""
771
+ }
772
+ }
773
+ yield f"event: content_block_start\ndata: {json.dumps(content_block_start)}\n\n"
774
+
775
+ # 3. Stream content_block_delta events
776
+ output_tokens = 0
777
+ async for token in generate_stream(
778
+ prompt,
779
+ max_tokens=request.max_tokens,
780
+ temperature=request.temperature or 1.0,
781
+ top_p=request.top_p or 0.999,
782
+ top_k=request.top_k,
783
+ ):
784
+ output_tokens += 1
785
+ delta = {
786
+ "type": "content_block_delta",
787
+ "index": 0,
788
+ "delta": {
789
+ "type": "text_delta",
790
+ "text": token
791
+ }
792
+ }
793
+ yield f"event: content_block_delta\ndata: {json.dumps(delta)}\n\n"
794
+
795
+ # 4. content_block_stop event
796
+ content_block_stop = {
797
+ "type": "content_block_stop",
798
+ "index": 0
799
+ }
800
+ yield f"event: content_block_stop\ndata: {json.dumps(content_block_stop)}\n\n"
801
+
802
+ # 5. message_delta event
803
+ message_delta = {
804
+ "type": "message_delta",
805
+ "delta": {
806
+ "stop_reason": "end_turn",
807
+ "stop_sequence": None
808
+ },
809
+ "usage": {
810
+ "output_tokens": output_tokens
811
+ }
812
+ }
813
+ yield f"event: message_delta\ndata: {json.dumps(message_delta)}\n\n"
814
+
815
+ # 6. message_stop event
816
+ message_stop = {"type": "message_stop"}
817
+ yield f"event: message_stop\ndata: {json.dumps(message_stop)}\n\n"
818
+
819
+ return StreamingResponse(
820
+ stream_generator(),
821
+ media_type="text/event-stream",
822
+ headers={
823
+ "Cache-Control": "no-cache",
824
+ "Connection": "keep-alive",
825
+ "X-Accel-Buffering": "no"
826
+ }
827
+ )
828
+
829
+ # Non-streaming response
830
+ response_text, input_tokens, output_tokens, stop_reason = generate_response(
831
+ prompt,
832
+ max_tokens=request.max_tokens,
833
+ temperature=request.temperature or 1.0,
834
+ top_p=request.top_p or 0.999,
835
+ top_k=request.top_k,
836
+ stop=request.stop_sequences,
837
+ )
838
+
839
+ # Map stop reason to Anthropic format
840
+ anthropic_stop_reason = "end_turn"
841
+ stop_sequence_used = None
842
+ if stop_reason == "length":
843
+ anthropic_stop_reason = "max_tokens"
844
+ elif stop_reason == "stop" and request.stop_sequences:
845
+ for seq in request.stop_sequences:
846
+ if seq in response_text:
847
+ anthropic_stop_reason = "stop_sequence"
848
+ stop_sequence_used = seq
849
+ break
850
+
851
+ return AnthropicResponse(
852
+ id=request_id,
853
+ model=request.model,
854
+ content=[AnthropicResponseContent(type="text", text=response_text)],
855
+ stop_reason=anthropic_stop_reason,
856
+ stop_sequence=stop_sequence_used,
857
+ usage=AnthropicUsage(
858
+ input_tokens=input_tokens,
859
+ output_tokens=output_tokens
860
+ )
861
+ )
862
+
863
+ # ============================================================================
864
+ # Health & Info Endpoints
865
+ # ============================================================================
866
+
867
+ @app.get("/")
868
+ async def root():
869
+ return {
870
+ "name": "Free Coding API",
871
+ "version": "1.0.0",
872
+ "model": MODEL_ID,
873
+ "compatibility": {
874
+ "openai": "v1 Chat Completions API",
875
+ "anthropic": "Messages API (2023-06-01)"
876
+ },
877
+ "endpoints": {
878
+ "openai_chat": "/v1/chat/completions",
879
+ "anthropic_messages": "/v1/messages",
880
+ "models": "/v1/models"
881
+ },
882
+ "docs": "/docs"
883
+ }
884
+
885
+ @app.get("/health")
886
+ async def health():
887
+ return {
888
+ "status": "healthy",
889
+ "model_loaded": model is not None,
890
+ "model_id": MODEL_ID
891
+ }
892
+
893
+ # ============================================================================
894
+ # Main Entry Point
895
+ # ============================================================================
896
+
897
+ if __name__ == "__main__":
898
+ import uvicorn
899
+ uvicorn.run(app, host="0.0.0.0", port=7860)
requirements.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HuggingFace Spaces - Free Coding API
2
+ # Optimized for CPU inference on free tier (2 vCPU, 16GB RAM)
3
+
4
+ # Core dependencies
5
+ fastapi==0.115.6
6
+ uvicorn[standard]==0.34.0
7
+ pydantic>=2.0.0
8
+
9
+ # ML dependencies
10
+ torch==2.1.0
11
+ transformers>=4.45.0
12
+ accelerate>=0.27.0
13
+
14
+ # Utilities
15
+ python-multipart