Jean Lima commited on
Commit
349efd4
·
0 Parent(s):

Deploy LFM2-8B-A1B local + multilingual models

Browse files
Files changed (4) hide show
  1. Dockerfile +16 -0
  2. README.md +71 -0
  3. app.py +525 -0
  4. requirements.txt +8 -0
Dockerfile ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12
2
+
3
+ RUN useradd -m -u 1000 user
4
+ USER user
5
+ ENV PATH="/home/user/.local/bin:$PATH"
6
+
7
+ WORKDIR /app
8
+
9
+ COPY --chown=user requirements.txt .
10
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
11
+
12
+ COPY --chown=user app.py .
13
+
14
+ EXPOSE 7860
15
+
16
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Multi-Models
3
+ emoji: 🤖
4
+ colorFrom: yellow
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ short_description: API Multi-Modal - Chat, Visão, Embeddings, Classificação
10
+ ---
11
+
12
+ # 🤖 DGGirl API v4 - Multi-Modal
13
+
14
+ API compatível com OpenAI para uso no **n8n** e outras integrações.
15
+
16
+ ## 🎯 Endpoints Disponíveis
17
+
18
+ | Endpoint | Método | Descrição |
19
+ |----------|--------|-----------|
20
+ | `/v1/chat/completions` | POST | Chat inteligente + Análise de imagens |
21
+ | `/v1/embeddings` | POST | Vetores semânticos (RAG) |
22
+ | `/v1/classify` | POST | Classificação zero-shot |
23
+ | `/v1/summarize` | POST | Resumir textos |
24
+ | `/v1/sentiment` | POST | Análise de sentimento |
25
+ | `/v1/models` | GET | Listar modelos |
26
+ | `/health` | GET | Status da API |
27
+
28
+ ## 🧠 Modelos Utilizados
29
+
30
+ - **Chat**: `LiquidAI/LFM2-8B-A1B` - Rápido e versátil
31
+ - **Visão**: `google/gemma-3-27b-it` - Análise de imagens
32
+ - **Embeddings**: `BAAI/bge-m3` - Vetores multilíngue
33
+ - **Classificação**: `facebook/bart-large-mnli` - Zero-shot
34
+ - **Sumarização**: `facebook/bart-large-cnn`
35
+ - **Sentimento**: `cardiffnlp/twitter-roberta-base-sentiment-latest`
36
+
37
+ ## 📋 Exemplos de Uso
38
+
39
+ ### Chat
40
+ ```bash
41
+ curl -X POST "https://SEU-SPACE.hf.space/v1/chat/completions" \
42
+ -H "Authorization: Bearer SEU_TOKEN" \
43
+ -H "Content-Type: application/json" \
44
+ -d '{"messages": [{"role": "user", "content": "Olá!"}]}'
45
+ ```
46
+
47
+ ### Classificar Intenção
48
+ ```bash
49
+ curl -X POST "https://SEU-SPACE.hf.space/v1/classify" \
50
+ -H "Authorization: Bearer SEU_TOKEN" \
51
+ -H "Content-Type: application/json" \
52
+ -d '{"text": "Quero cancelar meu pedido", "labels": ["pedido", "cancelamento", "dúvida"]}'
53
+ ```
54
+
55
+ ### Análise de Sentimento
56
+ ```bash
57
+ curl -X POST "https://SEU-SPACE.hf.space/v1/sentiment" \
58
+ -H "Authorization: Bearer SEU_TOKEN" \
59
+ -H "Content-Type: application/json" \
60
+ -d '{"text": "Estou muito satisfeito com o atendimento!"}'
61
+ ```
62
+
63
+ ## ⚙️ Configuração
64
+
65
+ Defina as variáveis de ambiente no Space:
66
+ - `HF_TOKEN`: Seu token do Hugging Face
67
+ - `API_KEY`: (Opcional) Chave de API personalizada
68
+
69
+ ## 📚 Documentação
70
+
71
+ Acesse `/docs` para a documentação Swagger interativa.
app.py ADDED
@@ -0,0 +1,525 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import uuid
3
+ import time
4
+ import hashlib
5
+ import traceback
6
+ from datetime import datetime
7
+ from fastapi import FastAPI, Request
8
+ from fastapi.responses import JSONResponse, HTMLResponse
9
+ from fastapi.middleware.cors import CORSMiddleware
10
+ from huggingface_hub import InferenceClient
11
+ import torch
12
+ from transformers import AutoModelForCausalLM, AutoTokenizer
13
+
14
+ # ============ Configuração ============
15
+
16
+ HF_TOKEN = os.environ.get("HF_TOKEN")
17
+ API_KEY = os.environ.get("API_KEY", HF_TOKEN)
18
+
19
+ # ============ Modelo Local - LFM2-8B-A1B (CPU) ============
20
+
21
+ print("🔄 Carregando LFM2-8B-A1B localmente...")
22
+ LOCAL_MODEL_NAME = "LiquidAI/LFM2-8B-A1B"
23
+
24
+ # Carregar tokenizer e modelo para CPU
25
+ chat_tokenizer = AutoTokenizer.from_pretrained(LOCAL_MODEL_NAME, token=HF_TOKEN, trust_remote_code=True)
26
+ chat_model = AutoModelForCausalLM.from_pretrained(
27
+ LOCAL_MODEL_NAME,
28
+ token=HF_TOKEN,
29
+ trust_remote_code=True,
30
+ torch_dtype=torch.float16, # Economia de memória
31
+ device_map="cpu",
32
+ low_cpu_mem_usage=True
33
+ )
34
+ print("✅ LFM2-8B-A1B carregado com sucesso!")
35
+
36
+ # ============ Clientes de Modelos (Inference API) ============
37
+
38
+ # Visão - Análise de imagens (Inference API)
39
+ vision_client = InferenceClient(token=HF_TOKEN, model="google/gemma-3-27b-it")
40
+
41
+ # Embeddings - Vetores semânticos (Inference API)
42
+ embed_client = InferenceClient(token=HF_TOKEN, model="BAAI/bge-m3")
43
+
44
+ # Classificação Zero-Shot (Multilíngue - PT/EN/ES...)
45
+ classify_client = InferenceClient(token=HF_TOKEN, model="joeddav/xlm-roberta-large-xnli")
46
+
47
+ # Sumarização (Multilíngue - 45 idiomas incluindo PT)
48
+ summarize_client = InferenceClient(token=HF_TOKEN, model="csebuetnlp/mT5_multilingual_XLSum")
49
+
50
+ # Análise de Sentimento (Multilíngue - PT/EN/ES...)
51
+ sentiment_client = InferenceClient(token=HF_TOKEN, model="lxyuan/distilbert-base-multilingual-cased-sentiments-student")
52
+
53
+
54
+ # ============ Função de Chat Local ============
55
+
56
+ def generate_local_chat(messages, max_tokens=1024, temperature=0.7):
57
+ """Gera resposta usando o modelo local LFM2-8B-A1B"""
58
+ # Formatar mensagens no formato ChatML
59
+ formatted_prompt = ""
60
+ for msg in messages:
61
+ role = msg.get("role", "user")
62
+ content = msg.get("content", "")
63
+ if isinstance(content, list):
64
+ # Extrair texto de conteúdo multimodal
65
+ content = " ".join([item.get("text", "") for item in content if item.get("type") == "text"])
66
+ formatted_prompt += f"<|im_start|>{role}\n{content}<|im_end|>\n"
67
+ formatted_prompt += "<|im_start|>assistant\n"
68
+
69
+ # Tokenizar
70
+ inputs = chat_tokenizer(formatted_prompt, return_tensors="pt")
71
+
72
+ # Gerar resposta
73
+ with torch.no_grad():
74
+ outputs = chat_model.generate(
75
+ inputs.input_ids,
76
+ max_new_tokens=max_tokens,
77
+ temperature=temperature,
78
+ do_sample=temperature > 0,
79
+ pad_token_id=chat_tokenizer.eos_token_id,
80
+ eos_token_id=chat_tokenizer.eos_token_id
81
+ )
82
+
83
+ # Decodificar resposta
84
+ response = chat_tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
85
+ return response.strip()
86
+
87
+ # ============ Cache ============
88
+
89
+ response_cache = {}
90
+ CACHE_MAX_SIZE = 500
91
+ CACHE_TTL_SECONDS = 3600
92
+
93
+ def get_cache_key(content, task):
94
+ data = str(content) + task
95
+ return hashlib.md5(data.encode()).hexdigest()
96
+
97
+ def get_cached_response(key):
98
+ if key in response_cache:
99
+ entry = response_cache[key]
100
+ if time.time() - entry["timestamp"] < CACHE_TTL_SECONDS:
101
+ return entry["response"]
102
+ else:
103
+ del response_cache[key]
104
+ return None
105
+
106
+ def set_cached_response(key, response):
107
+ if len(response_cache) >= CACHE_MAX_SIZE:
108
+ oldest_key = min(response_cache.keys(), key=lambda k: response_cache[k]["timestamp"])
109
+ del response_cache[oldest_key]
110
+ response_cache[key] = {"response": response, "timestamp": time.time()}
111
+
112
+ def verify_api_key(request: Request) -> bool:
113
+ auth = request.headers.get("Authorization", "")
114
+ return auth.startswith("Bearer ") and auth[7:] == API_KEY
115
+
116
+ def has_image_content(messages):
117
+ for msg in messages:
118
+ content = msg.get("content", [])
119
+ if isinstance(content, list):
120
+ for item in content:
121
+ if isinstance(item, dict) and item.get("type") == "image_url":
122
+ return True
123
+ return False
124
+
125
+ # ============ FastAPI ============
126
+
127
+ app = FastAPI(
128
+ title="DGGirl Multi-Modal API",
129
+ description="API compatível com OpenAI para chat, visão, embeddings, classificação, sumarização e sentimento",
130
+ version="4.0.0"
131
+ )
132
+
133
+ app.add_middleware(
134
+ CORSMiddleware,
135
+ allow_origins=["*"],
136
+ allow_methods=["*"],
137
+ allow_headers=["*"],
138
+ )
139
+
140
+ # ============ Página Inicial ============
141
+
142
+ @app.get("/", response_class=HTMLResponse)
143
+ async def home():
144
+ endpoints_html = """
145
+ <div class="endpoint"><span class="method">POST</span> <code>/v1/chat/completions</code><p>💬 Chat inteligente (LFM2-8B) + Visão (Gemma 3)</p></div>
146
+ <div class="endpoint"><span class="method">POST</span> <code>/v1/embeddings</code><p>🔢 Vetores semânticos para RAG (BGE-M3)</p></div>
147
+ <div class="endpoint"><span class="method">POST</span> <code>/v1/classify</code><p>🏷️ Classificação zero-shot de textos</p></div>
148
+ <div class="endpoint"><span class="method">POST</span> <code>/v1/summarize</code><p>📝 Resumir textos longos</p></div>
149
+ <div class="endpoint"><span class="method">POST</span> <code>/v1/sentiment</code><p>😊 Análise de sentimento</p></div>
150
+ """
151
+ return f"""
152
+ <!DOCTYPE html>
153
+ <html>
154
+ <head>
155
+ <title>DGGirl API v4</title>
156
+ <style>
157
+ body {{ font-family: 'Segoe UI', Tahoma, sans-serif; max-width: 900px; margin: 40px auto; padding: 20px; background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%); min-height: 100vh; }}
158
+ .container {{ background: rgba(255,255,255,0.95); padding: 40px; border-radius: 20px; box-shadow: 0 10px 40px rgba(0,0,0,0.3); }}
159
+ h1 {{ color: #1a73e8; border-bottom: 3px solid #4285f4; padding-bottom: 15px; margin-bottom: 20px; }}
160
+ .status {{ background: linear-gradient(135deg, #00c853, #69f0ae); color: white; padding: 8px 16px; border-radius: 25px; font-weight: bold; font-size: 0.9em; display: inline-block; }}
161
+ .endpoint {{ background: #f8f9fa; padding: 18px; margin: 12px 0; border-radius: 12px; border-left: 6px solid #4285f4; transition: transform 0.2s; }}
162
+ .endpoint:hover {{ transform: translateX(5px); background: #e8f0fe; }}
163
+ .method {{ background: #d93025; color: white; padding: 4px 10px; border-radius: 5px; font-weight: bold; font-size: 0.85em; }}
164
+ code {{ background: #e8eaed; padding: 4px 10px; border-radius: 6px; font-family: 'Consolas', monospace; font-size: 0.95em; }}
165
+ .models {{ background: #e3f2fd; padding: 20px; border-radius: 12px; margin-top: 20px; }}
166
+ .models h3 {{ margin-top: 0; color: #1565c0; }}
167
+ .model-tag {{ display: inline-block; background: #1a73e8; color: white; padding: 5px 12px; border-radius: 15px; margin: 4px; font-size: 0.85em; }}
168
+ a {{ color: #1a73e8; text-decoration: none; }}
169
+ a:hover {{ text-decoration: underline; }}
170
+ .stats {{ display: flex; gap: 20px; margin-top: 20px; }}
171
+ .stat {{ background: #fff3e0; padding: 15px; border-radius: 10px; flex: 1; text-align: center; }}
172
+ .stat-value {{ font-size: 1.5em; font-weight: bold; color: #e65100; }}
173
+ </style>
174
+ </head>
175
+ <body>
176
+ <div class="container">
177
+ <h1>🤖 DGGirl API v4 - Multi-Modal</h1>
178
+ <p>Status: <span class="status">● OPERACIONAL</span></p>
179
+
180
+ {endpoints_html}
181
+
182
+ <div class="models">
183
+ <h3>🧠 Modelos Ativos</h3>
184
+ <span class="model-tag">LiquidAI/LFM2-8B-A1B</span>
185
+ <span class="model-tag">Gemma 3 27B Vision</span>
186
+ <span class="model-tag">BGE-M3 Embeddings</span>
187
+ <span class="model-tag">XLM-RoBERTa Classification</span>
188
+ <span class="model-tag">mT5 Summarization</span>
189
+ <span class="model-tag">DistilBERT Sentiment</span>
190
+ </div>
191
+
192
+ <div class="stats">
193
+ <div class="stat">
194
+ <div class="stat-value">{len(response_cache)}</div>
195
+ <div>Cache Items</div>
196
+ </div>
197
+ <div class="stat">
198
+ <div class="stat-value">6</div>
199
+ <div>Endpoints</div>
200
+ </div>
201
+ <div class="stat">
202
+ <div class="stat-value">6</div>
203
+ <div>Modelos</div>
204
+ </div>
205
+ </div>
206
+
207
+ <p style="margin-top: 25px; text-align: center;">
208
+ <a href="/docs">📚 Documentação Swagger</a> |
209
+ <a href="/health">❤️ Health Check</a>
210
+ </p>
211
+ </div>
212
+ </body>
213
+ </html>
214
+ """
215
+
216
+ # ============ Chat Completions (Texto + Visão) ============
217
+
218
+ @app.post("/v1/chat/completions")
219
+ async def chat_completions(request: Request):
220
+ if not verify_api_key(request):
221
+ return JSONResponse(status_code=401, content={"error": "Invalid API key"})
222
+
223
+ try:
224
+ body = await request.json()
225
+ raw_messages = body.get("messages", [])
226
+ model = body.get("model", "auto")
227
+
228
+ # Detectar se precisa de visão
229
+ has_vision = model == "vision" or has_image_content(raw_messages)
230
+ model_used = "google/gemma-3-27b-it" if has_vision else "LiquidAI/LFM2-8B-A1B"
231
+ client = vision_client if has_vision else chat_client
232
+
233
+ # Cache (apenas para texto)
234
+ cache_key = get_cache_key(raw_messages, model_used)
235
+ if not has_vision:
236
+ cached = get_cached_response(cache_key)
237
+ if cached:
238
+ return cached
239
+
240
+ # Processar mensagens de visão
241
+ if has_vision:
242
+ last_user_msg = next((msg for msg in reversed(raw_messages) if msg.get("role") == "user"), None)
243
+ if not last_user_msg:
244
+ return JSONResponse(status_code=400, content={"error": "No user message"})
245
+
246
+ content = last_user_msg.get("content", [])
247
+ vision_content = []
248
+ text_parts = []
249
+
250
+ if isinstance(content, list):
251
+ for item in content:
252
+ if isinstance(item, dict):
253
+ if item.get("type") == "text":
254
+ text_parts.append(item.get("text", ""))
255
+ elif item.get("type") == "image_url":
256
+ url = item.get("image_url", {}).get("url", "")
257
+ if url:
258
+ vision_content.append({"type": "image_url", "image_url": {"url": url}})
259
+
260
+ final_text = " ".join(text_parts) if text_parts else "Analise a imagem."
261
+ vision_content.append({"type": "text", "text": final_text})
262
+ messages = [{"role": "user", "content": vision_content}]
263
+ else:
264
+ messages = raw_messages
265
+ else:
266
+ messages = raw_messages
267
+
268
+ # Gerar resposta
269
+ if has_vision:
270
+ # Usar Inference API para visão
271
+ response = vision_client.chat_completion(
272
+ messages=messages,
273
+ max_tokens=body.get("max_tokens", 1024),
274
+ temperature=body.get("temperature", 0.7)
275
+ )
276
+ response_content = response.choices[0].message.content
277
+ else:
278
+ # Usar modelo local para texto
279
+ response_content = generate_local_chat(
280
+ messages=messages,
281
+ max_tokens=body.get("max_tokens", 1024),
282
+ temperature=body.get("temperature", 0.7)
283
+ )
284
+
285
+ result = {
286
+ "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
287
+ "object": "chat.completion",
288
+ "created": int(time.time()),
289
+ "model": model_used,
290
+ "choices": [{
291
+ "index": 0,
292
+ "message": {
293
+ "role": "assistant",
294
+ "content": response_content
295
+ },
296
+ "finish_reason": "stop"
297
+ }],
298
+ "usage": {
299
+ "prompt_tokens": 0,
300
+ "completion_tokens": 0,
301
+ "total_tokens": 0
302
+ }
303
+ }
304
+
305
+ if not has_vision:
306
+ set_cached_response(cache_key, result)
307
+ return result
308
+
309
+ except Exception as e:
310
+ return JSONResponse(status_code=500, content={"error": str(e), "detail": traceback.format_exc()})
311
+
312
+ # ============ Embeddings ============
313
+
314
+ @app.post("/v1/embeddings")
315
+ async def create_embeddings(request: Request):
316
+ if not verify_api_key(request):
317
+ return JSONResponse(status_code=401, content={"error": "Invalid API key"})
318
+
319
+ try:
320
+ body = await request.json()
321
+ input_text = body.get("input", "")
322
+ texts = input_text if isinstance(input_text, list) else [input_text]
323
+
324
+ embeddings_data = []
325
+ for idx, text in enumerate(texts):
326
+ res = embed_client.feature_extraction(text)
327
+ embedding = res.tolist() if hasattr(res, 'tolist') else res
328
+ embeddings_data.append({
329
+ "object": "embedding",
330
+ "index": idx,
331
+ "embedding": embedding
332
+ })
333
+
334
+ return {
335
+ "object": "list",
336
+ "data": embeddings_data,
337
+ "model": "bge-m3",
338
+ "usage": {"prompt_tokens": sum(len(t.split()) for t in texts), "total_tokens": sum(len(t.split()) for t in texts)}
339
+ }
340
+ except Exception as e:
341
+ return JSONResponse(status_code=500, content={"error": str(e), "detail": traceback.format_exc()})
342
+
343
+ # ============ Classificação Zero-Shot ============
344
+
345
+ @app.post("/v1/classify")
346
+ async def classify_text(request: Request):
347
+ if not verify_api_key(request):
348
+ return JSONResponse(status_code=401, content={"error": "Invalid API key"})
349
+
350
+ try:
351
+ body = await request.json()
352
+ text = body.get("text", "")
353
+ labels = body.get("labels", ["positive", "negative", "neutral"])
354
+ multi_label = body.get("multi_label", False)
355
+
356
+ if not text:
357
+ return JSONResponse(status_code=400, content={"error": "Text is required"})
358
+
359
+ # Cache
360
+ cache_key = get_cache_key(text + str(labels), "classify")
361
+ cached = get_cached_response(cache_key)
362
+ if cached:
363
+ return cached
364
+
365
+ result = classify_client.zero_shot_classification(
366
+ text,
367
+ labels,
368
+ multi_label=multi_label
369
+ )
370
+
371
+ response = {
372
+ "object": "classification",
373
+ "text": text,
374
+ "labels": result.labels if hasattr(result, 'labels') else labels,
375
+ "scores": result.scores if hasattr(result, 'scores') else [],
376
+ "predicted_label": result.labels[0] if hasattr(result, 'labels') and result.labels else None,
377
+ "model": "bart-large-mnli"
378
+ }
379
+
380
+ set_cached_response(cache_key, response)
381
+ return response
382
+
383
+ except Exception as e:
384
+ return JSONResponse(status_code=500, content={"error": str(e), "detail": traceback.format_exc()})
385
+
386
+ # ============ Sumarização ============
387
+
388
+ @app.post("/v1/summarize")
389
+ async def summarize_text(request: Request):
390
+ if not verify_api_key(request):
391
+ return JSONResponse(status_code=401, content={"error": "Invalid API key"})
392
+
393
+ try:
394
+ body = await request.json()
395
+ text = body.get("text", "")
396
+ max_length = body.get("max_length", 150)
397
+ min_length = body.get("min_length", 30)
398
+
399
+ if not text:
400
+ return JSONResponse(status_code=400, content={"error": "Text is required"})
401
+
402
+ # Cache
403
+ cache_key = get_cache_key(text, "summarize")
404
+ cached = get_cached_response(cache_key)
405
+ if cached:
406
+ return cached
407
+
408
+ result = summarize_client.summarization(
409
+ text,
410
+ parameters={"max_length": max_length, "min_length": min_length}
411
+ )
412
+
413
+ summary = result.summary_text if hasattr(result, 'summary_text') else str(result)
414
+
415
+ response = {
416
+ "object": "summarization",
417
+ "original_length": len(text),
418
+ "summary": summary,
419
+ "summary_length": len(summary),
420
+ "compression_ratio": round(len(summary) / len(text) * 100, 2),
421
+ "model": "bart-large-cnn"
422
+ }
423
+
424
+ set_cached_response(cache_key, response)
425
+ return response
426
+
427
+ except Exception as e:
428
+ return JSONResponse(status_code=500, content={"error": str(e), "detail": traceback.format_exc()})
429
+
430
+ # ============ Análise de Sentimento ============
431
+
432
+ @app.post("/v1/sentiment")
433
+ async def analyze_sentiment(request: Request):
434
+ if not verify_api_key(request):
435
+ return JSONResponse(status_code=401, content={"error": "Invalid API key"})
436
+
437
+ try:
438
+ body = await request.json()
439
+ text = body.get("text", "")
440
+
441
+ if not text:
442
+ return JSONResponse(status_code=400, content={"error": "Text is required"})
443
+
444
+ # Cache
445
+ cache_key = get_cache_key(text, "sentiment")
446
+ cached = get_cached_response(cache_key)
447
+ if cached:
448
+ return cached
449
+
450
+ result = sentiment_client.text_classification(text)
451
+
452
+ # Mapear labels para português
453
+ label_map = {
454
+ "positive": "positivo",
455
+ "negative": "negativo",
456
+ "neutral": "neutro",
457
+ "POSITIVE": "positivo",
458
+ "NEGATIVE": "negativo",
459
+ "NEUTRAL": "neutro"
460
+ }
461
+
462
+ if isinstance(result, list) and len(result) > 0:
463
+ top_result = result[0]
464
+ label = top_result.label if hasattr(top_result, 'label') else str(top_result)
465
+ score = top_result.score if hasattr(top_result, 'score') else 0.0
466
+ else:
467
+ label = str(result)
468
+ score = 1.0
469
+
470
+ response = {
471
+ "object": "sentiment",
472
+ "text": text,
473
+ "sentiment": label_map.get(label, label),
474
+ "sentiment_raw": label,
475
+ "confidence": round(score, 4),
476
+ "all_scores": [{"label": r.label, "score": round(r.score, 4)} for r in result] if isinstance(result, list) else [],
477
+ "model": "roberta-sentiment"
478
+ }
479
+
480
+ set_cached_response(cache_key, response)
481
+ return response
482
+
483
+ except Exception as e:
484
+ return JSONResponse(status_code=500, content={"error": str(e), "detail": traceback.format_exc()})
485
+
486
+ # ============ Endpoints Auxiliares ============
487
+
488
+ @app.get("/v1/models")
489
+ async def list_models():
490
+ return {
491
+ "object": "list",
492
+ "data": [
493
+ {"id": "lfm2-8b", "object": "model", "owned_by": "liquidai", "description": "Chat rápido e versátil"},
494
+ {"id": "gemma-3-vision", "object": "model", "owned_by": "google", "description": "Análise de imagens"},
495
+ {"id": "bge-m3", "object": "model", "owned_by": "baai", "description": "Embeddings multilíngue"},
496
+ {"id": "xlm-roberta-classify", "object": "model", "owned_by": "joeddav", "description": "Classificação zero-shot multilíngue"},
497
+ {"id": "mt5-summarize", "object": "model", "owned_by": "csebuetnlp", "description": "Sumarização multilíngue (45 idiomas)"},
498
+ {"id": "distilbert-sentiment", "object": "model", "owned_by": "lxyuan", "description": "Análise de sentimento multilíngue"}
499
+ ]
500
+ }
501
+
502
+ @app.get("/health")
503
+ async def health():
504
+ return {
505
+ "status": "healthy",
506
+ "timestamp": datetime.now().isoformat(),
507
+ "cache_size": len(response_cache),
508
+ "version": "4.0.0",
509
+ "models": {
510
+ "chat": "LiquidAI/LFM2-8B-A1B",
511
+ "vision": "google/gemma-3-27b-it",
512
+ "embeddings": "BAAI/bge-m3",
513
+ "classify": "joeddav/xlm-roberta-large-xnli",
514
+ "summarize": "csebuetnlp/mT5_multilingual_XLSum",
515
+ "sentiment": "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
516
+ }
517
+ }
518
+
519
+ @app.delete("/v1/cache/clear")
520
+ async def clear_cache(request: Request):
521
+ if not verify_api_key(request):
522
+ return JSONResponse(status_code=401, content={"error": "Invalid API key"})
523
+ global response_cache
524
+ response_cache = {}
525
+ return {"message": "Cache cleared", "timestamp": datetime.now().isoformat()}
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.109.0
2
+ uvicorn[standard]==0.27.0
3
+ huggingface-hub>=0.25.0
4
+ python-multipart==0.0.6
5
+ torch>=2.0.0
6
+ transformers>=4.40.0
7
+ accelerate>=0.27.0
8
+ sentencepiece