rag_template / docs /API_GUIDE.md
Guilherme Favaron
Sync: Complete project update (Phase 6) - API, Metadata, Eval, Docs
a686b1b

A newer version of the Gradio SDK is available: 6.5.0

Upgrade

Guia da API REST - RAG Template

API REST completa para o RAG Template usando FastAPI.


Visao Geral

A API REST permite integracao programatica com o sistema RAG, oferecendo endpoints para:

  • Ingestao de documentos (texto ou upload de arquivos)
  • Queries RAG
  • Gerenciamento de documentos
  • Estatisticas do sistema
  • Health checks

Base URL: http://localhost:8000/api/v1

Documentacao Interativa: http://localhost:8000/api/docs


Autenticacao

Todos os endpoints (exceto /health) requerem autenticacao via API key.

Configurar API Keys

No arquivo .env:

API_KEYS=key1,key2,key3

Usar API Key

Inclua header em todas as requisicoes:

X-API-Key: sua_api_key_aqui

Iniciar Servidor

Modo Desenvolvimento

python api_server.py

Modo Producao

uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4

Com Docker

docker run -p 8000:8000 -e DATABASE_URL=... -e API_KEYS=... rag-template

Endpoints

GET /api/v1/health

Health check do sistema.

Autenticacao: Nao requerida

Response:

{
  "status": "healthy",
  "timestamp": "2026-01-23T10:30:00",
  "database": "healthy",
  "embeddings": "healthy",
  "version": "1.6.0"
}

POST /api/v1/ingest

Ingere texto no sistema.

Request Body:

{
  "text": "Conteudo do documento...",
  "title": "Titulo do Documento",
  "chunk_size": 1000,
  "chunk_overlap": 200,
  "strategy": "recursive",
  "metadata": {
    "document_type": "TXT",
    "tags": ["tech", "ai"],
    "security_level": "public"
  }
}

Response:

{
  "document_id": 123,
  "num_chunks": 15,
  "message": "Document ingested successfully",
  "metadata": {...}
}

POST /api/v1/upload

Upload e ingere arquivo (PDF ou TXT).

Request: multipart/form-data

  • file: Arquivo a fazer upload
  • chunk_size: (opcional) Tamanho dos chunks
  • chunk_overlap: (opcional) Overlap entre chunks
  • strategy: (opcional) Estrategia de chunking

Response: Similar ao /ingest

POST /api/v1/query

Executa query RAG.

Request Body:

{
  "query": "O que e RAG?",
  "top_k": 5,
  "temperature": 0.3,
  "max_tokens": 512,
  "model": "huggingface",
  "filters": {
    "document_type": "PDF",
    "tags": ["tech"]
  }
}

Response:

{
  "query": "O que e RAG?",
  "response": "RAG e Retrieval-Augmented Generation...",
  "contexts": [
    {
      "content": "Contexto relevante...",
      "similarity": 0.92,
      "document_id": 123
    }
  ],
  "metadata": {
    "num_contexts": 5,
    "model": "huggingface",
    "temperature": 0.3,
    "max_tokens": 512
  }
}

GET /api/v1/documents

Lista documentos no sistema.

Query Parameters:

  • limit: (opcional) Numero maximo de documentos (default: 100)
  • offset: (opcional) Offset para paginacao (default: 0)
  • session_id: (opcional) Filtrar por session_id

Response:

[
  {
    "id": 123,
    "title": "Documento 1",
    "content": "Conteudo...",
    "chunk_count": 15,
    "created_at": "2026-01-23T10:30:00",
    "metadata": {...}
  }
]

DELETE /api/v1/documents/{document_id}

Deleta documento do sistema.

Path Parameters:

  • document_id: ID do documento

Response:

{
  "message": "Document deleted successfully",
  "document_id": 123
}

GET /api/v1/stats

Retorna estatisticas do sistema.

Response:

{
  "database": {
    "total_documents": 150,
    "total_chunks": 2500,
    "avg_chunks_per_doc": 16.67
  },
  "metadata": {
    "total": 150,
    "by_type": {"PDF": 100, "TXT": 50},
    "by_security": {"public": 120, "internal": 30}
  },
  "timestamp": "2026-01-23T10:30:00"
}

Usando Python SDK

Instalacao

pip install -e .  # Instalar localmente

Uso Basico

from sdk import RAGClient

# Criar cliente
client = RAGClient(
    base_url="http://localhost:8000",
    api_key="sua_api_key"
)

# Health check
health = client.health_check()
print(health)

# Ingerir texto
result = client.ingest_text(
    text="Conteudo do documento...",
    title="Meu Documento",
    metadata={"tags": ["tech", "ai"]}
)
print(f"Document ID: {result['document_id']}")

# Upload arquivo
result = client.upload_file("documento.pdf")
print(f"Chunks: {result['num_chunks']}")

# Query
response = client.query(
    query="O que e RAG?",
    top_k=5,
    filters={"tags": ["tech"]}
)
print(response['response'])

# Listar documentos
docs = client.list_documents(limit=10)
for doc in docs:
    print(f"{doc['id']}: {doc['title']}")

# Deletar documento
client.delete_document(123)

# Estatisticas
stats = client.get_stats()
print(stats)

Exemplos de Uso

Exemplo 1: Pipeline de Ingestao

from sdk import RAGClient
from pathlib import Path

client = RAGClient(api_key="my_key")

# Ingerir multiplos arquivos
docs_dir = Path("./documents")
for file in docs_dir.glob("*.pdf"):
    result = client.upload_file(str(file))
    print(f"Ingested {file.name}: {result['num_chunks']} chunks")

Exemplo 2: Chatbot Simples

from sdk import RAGClient

client = RAGClient(api_key="my_key")

while True:
    query = input("Voce: ")
    if query.lower() in ["sair", "exit"]:
        break

    response = client.query(query, top_k=5)
    print(f"Bot: {response['response']}\n")

Exemplo 3: Busca Filtrada

from sdk import RAGClient

client = RAGClient(api_key="my_key")

# Buscar apenas em documentos publicos de tech
response = client.query(
    query="Como funciona embedding?",
    filters={
        "security_level": "public",
        "tags": ["tech", "ai"]
    }
)

print(response['response'])
print(f"Contextos usados: {response['metadata']['num_contexts']}")

Usando cURL

Health Check

curl http://localhost:8000/api/v1/health

Ingerir Texto

curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sua_key" \
  -d '{
    "text": "Conteudo do documento",
    "title": "Titulo"
  }'

Query

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sua_key" \
  -d '{
    "query": "O que e RAG?",
    "top_k": 5
  }'

Listar Documentos

curl http://localhost:8000/api/v1/documents?limit=10 \
  -H "X-API-Key: sua_key"

Rate Limiting

A API nao implementa rate limiting por padrao. Para producao, considere usar:

  • Nginx: Com limit_req_zone
  • Traefik: Com middleware de rate limiting
  • CloudFlare: Rate limiting no CDN

Erros

Codigos de Status

  • 200: Sucesso
  • 400: Bad Request (parametros invalidos)
  • 401: Unauthorized (API key invalida ou ausente)
  • 404: Not Found (recurso nao encontrado)
  • 500: Internal Server Error

Formato de Erro

{
  "detail": "Error message here"
}

Performance

Benchmarks

Testes em maquina local (M1 Pro, 16GB RAM):

Endpoint Tempo Medio Notas
/health <10ms Muito rapido
/ingest 500-2000ms Depende do tamanho do documento
/query 200-1000ms Depende do LLM escolhido
/documents <100ms Paginado

Otimizacoes

  1. Cache de Embeddings: Ativado automaticamente
  2. Connection Pooling: Usar pgBouncer ou Supabase
  3. Workers: Multiplos workers Uvicorn para producao
  4. Async: Endpoints sao async por padrao

Deploy em Producao

Docker Compose

version: '3.8'
services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://...
      - HF_TOKEN=...
      - API_KEYS=key1,key2
    command: uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4

Variavies de Ambiente

# API Config
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=4
API_RELOAD=false
API_KEYS=key1,key2,key3

# Database
DATABASE_URL=postgresql://...

# LLM
HF_TOKEN=...

Seguranca

Best Practices

  1. HTTPS: Sempre use HTTPS em producao
  2. API Keys: Gere keys fortes e rotacione regularmente
  3. Rate Limiting: Implemente rate limiting
  4. CORS: Configure CORS apropriadamente
  5. Input Validation: Validacao automatica via Pydantic
  6. Logs: Monitore logs de acesso

Troubleshooting

API nao inicia

Verificar:

  • PostgreSQL esta rodando
  • DATABASE_URL esta correto
  • Porta 8000 esta disponivel

Erros de autenticacao

Verificar:

  • API key esta configurada no .env
  • Header X-API-Key esta presente
  • Key esta correta

Queries lentas

Verificar:

  • Indices do banco estao criados
  • Cache de embeddings esta ativo
  • Modelo LLM nao esta muito grande

Proximos Passos

  1. Implementar rate limiting
  2. Adicionar autenticacao OAuth2
  3. Criar dashboard de monitoramento
  4. Publicar SDK no PyPI
  5. Adicionar webhooks para eventos

Recursos