Spaces:

guifav
/

rag_template

Sleeping

File size: 9,129 Bytes

a686b1b

# Guia da API REST - RAG Template

API REST completa para o RAG Template usando FastAPI.

---

## Visao Geral

A API REST permite integracao programatica com o sistema RAG, oferecendo endpoints para:
- Ingestao de documentos (texto ou upload de arquivos)
- Queries RAG
- Gerenciamento de documentos
- Estatisticas do sistema
- Health checks

**Base URL**: `http://localhost:8000/api/v1`

**Documentacao Interativa**: `http://localhost:8000/api/docs`

---

## Autenticacao

Todos os endpoints (exceto `/health`) requerem autenticacao via API key.

### Configurar API Keys

No arquivo `.env`:

```bash
API_KEYS=key1,key2,key3
```

### Usar API Key

Inclua header em todas as requisicoes:

```
X-API-Key: sua_api_key_aqui
```

---

## Iniciar Servidor

### Modo Desenvolvimento

```bash
python api_server.py
```

### Modo Producao

```bash
uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4
```

### Com Docker

```bash
docker run -p 8000:8000 -e DATABASE_URL=... -e API_KEYS=... rag-template
```

---

## Endpoints

### GET /api/v1/health

Health check do sistema.

**Autenticacao**: Nao requerida

**Response**:
```json
{
  "status": "healthy",
  "timestamp": "2026-01-23T10:30:00",
  "database": "healthy",
  "embeddings": "healthy",
  "version": "1.6.0"
}
```

### POST /api/v1/ingest

Ingere texto no sistema.

**Request Body**:
```json
{
  "text": "Conteudo do documento...",
  "title": "Titulo do Documento",
  "chunk_size": 1000,
  "chunk_overlap": 200,
  "strategy": "recursive",
  "metadata": {
    "document_type": "TXT",
    "tags": ["tech", "ai"],
    "security_level": "public"
  }
}
```

**Response**:
```json
{
  "document_id": 123,
  "num_chunks": 15,
  "message": "Document ingested successfully",
  "metadata": {...}
}
```

### POST /api/v1/upload

Upload e ingere arquivo (PDF ou TXT).

**Request**: `multipart/form-data`
- `file`: Arquivo a fazer upload
- `chunk_size`: (opcional) Tamanho dos chunks
- `chunk_overlap`: (opcional) Overlap entre chunks
- `strategy`: (opcional) Estrategia de chunking

**Response**: Similar ao `/ingest`

### POST /api/v1/query

Executa query RAG.

**Request Body**:
```json
{
  "query": "O que e RAG?",
  "top_k": 5,
  "temperature": 0.3,
  "max_tokens": 512,
  "model": "huggingface",
  "filters": {
    "document_type": "PDF",
    "tags": ["tech"]
  }
}
```

**Response**:
```json
{
  "query": "O que e RAG?",
  "response": "RAG e Retrieval-Augmented Generation...",
  "contexts": [
    {
      "content": "Contexto relevante...",
      "similarity": 0.92,
      "document_id": 123
    }
  ],
  "metadata": {
    "num_contexts": 5,
    "model": "huggingface",
    "temperature": 0.3,
    "max_tokens": 512
  }
}
```

### GET /api/v1/documents

Lista documentos no sistema.

**Query Parameters**:
- `limit`: (opcional) Numero maximo de documentos (default: 100)
- `offset`: (opcional) Offset para paginacao (default: 0)
- `session_id`: (opcional) Filtrar por session_id

**Response**:
```json
[
  {
    "id": 123,
    "title": "Documento 1",
    "content": "Conteudo...",
    "chunk_count": 15,
    "created_at": "2026-01-23T10:30:00",
    "metadata": {...}
  }
]
```

### DELETE /api/v1/documents/{document_id}

Deleta documento do sistema.

**Path Parameters**:
- `document_id`: ID do documento

**Response**:
```json
{
  "message": "Document deleted successfully",
  "document_id": 123
}
```

### GET /api/v1/stats

Retorna estatisticas do sistema.

**Response**:
```json
{
  "database": {
    "total_documents": 150,
    "total_chunks": 2500,
    "avg_chunks_per_doc": 16.67
  },
  "metadata": {
    "total": 150,
    "by_type": {"PDF": 100, "TXT": 50},
    "by_security": {"public": 120, "internal": 30}
  },
  "timestamp": "2026-01-23T10:30:00"
}
```

---

## Usando Python SDK

### Instalacao

```bash
pip install -e .  # Instalar localmente
```

### Uso Basico

```python
from sdk import RAGClient

# Criar cliente
client = RAGClient(
    base_url="http://localhost:8000",
    api_key="sua_api_key"
)

# Health check
health = client.health_check()
print(health)

# Ingerir texto
result = client.ingest_text(
    text="Conteudo do documento...",
    title="Meu Documento",
    metadata={"tags": ["tech", "ai"]}
)
print(f"Document ID: {result['document_id']}")

# Upload arquivo
result = client.upload_file("documento.pdf")
print(f"Chunks: {result['num_chunks']}")

# Query
response = client.query(
    query="O que e RAG?",
    top_k=5,
    filters={"tags": ["tech"]}
)
print(response['response'])

# Listar documentos
docs = client.list_documents(limit=10)
for doc in docs:
    print(f"{doc['id']}: {doc['title']}")

# Deletar documento
client.delete_document(123)

# Estatisticas
stats = client.get_stats()
print(stats)
```

---

## Exemplos de Uso

### Exemplo 1: Pipeline de Ingestao

```python
from sdk import RAGClient
from pathlib import Path

client = RAGClient(api_key="my_key")

# Ingerir multiplos arquivos
docs_dir = Path("./documents")
for file in docs_dir.glob("*.pdf"):
    result = client.upload_file(str(file))
    print(f"Ingested {file.name}: {result['num_chunks']} chunks")
```

### Exemplo 2: Chatbot Simples

```python
from sdk import RAGClient

client = RAGClient(api_key="my_key")

while True:
    query = input("Voce: ")
    if query.lower() in ["sair", "exit"]:
        break

    response = client.query(query, top_k=5)
    print(f"Bot: {response['response']}\n")
```

### Exemplo 3: Busca Filtrada

```python
from sdk import RAGClient

client = RAGClient(api_key="my_key")

# Buscar apenas em documentos publicos de tech
response = client.query(
    query="Como funciona embedding?",
    filters={
        "security_level": "public",
        "tags": ["tech", "ai"]
    }
)

print(response['response'])
print(f"Contextos usados: {response['metadata']['num_contexts']}")
```

---

## Usando cURL

### Health Check

```bash
curl http://localhost:8000/api/v1/health
```

### Ingerir Texto

```bash
curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sua_key" \
  -d '{
    "text": "Conteudo do documento",
    "title": "Titulo"
  }'
```

### Query

```bash
curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sua_key" \
  -d '{
    "query": "O que e RAG?",
    "top_k": 5
  }'
```

### Listar Documentos

```bash
curl http://localhost:8000/api/v1/documents?limit=10 \
  -H "X-API-Key: sua_key"
```

---

## Rate Limiting

A API nao implementa rate limiting por padrao. Para producao, considere usar:

- **Nginx**: Com `limit_req_zone`
- **Traefik**: Com middleware de rate limiting
- **CloudFlare**: Rate limiting no CDN

---

## Erros

### Codigos de Status

- `200`: Sucesso
- `400`: Bad Request (parametros invalidos)
- `401`: Unauthorized (API key invalida ou ausente)
- `404`: Not Found (recurso nao encontrado)
- `500`: Internal Server Error

### Formato de Erro

```json
{
  "detail": "Error message here"
}
```

---

## Performance

### Benchmarks

Testes em maquina local (M1 Pro, 16GB RAM):

| Endpoint | Tempo Medio | Notas |
|----------|-------------|-------|
| /health | <10ms | Muito rapido |
| /ingest | 500-2000ms | Depende do tamanho do documento |
| /query | 200-1000ms | Depende do LLM escolhido |
| /documents | <100ms | Paginado |

### Otimizacoes

1. **Cache de Embeddings**: Ativado automaticamente
2. **Connection Pooling**: Usar pgBouncer ou Supabase
3. **Workers**: Multiplos workers Uvicorn para producao
4. **Async**: Endpoints sao async por padrao

---

## Deploy em Producao

### Docker Compose

```yaml
version: '3.8'
services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://...
      - HF_TOKEN=...
      - API_KEYS=key1,key2
    command: uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4
```

### Variavies de Ambiente

```bash
# API Config
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=4
API_RELOAD=false
API_KEYS=key1,key2,key3

# Database
DATABASE_URL=postgresql://...

# LLM
HF_TOKEN=...
```

---

## Seguranca

### Best Practices

1. **HTTPS**: Sempre use HTTPS em producao
2. **API Keys**: Gere keys fortes e rotacione regularmente
3. **Rate Limiting**: Implemente rate limiting
4. **CORS**: Configure CORS apropriadamente
5. **Input Validation**: Validacao automatica via Pydantic
6. **Logs**: Monitore logs de acesso

---

## Troubleshooting

### API nao inicia

Verificar:
- PostgreSQL esta rodando
- `DATABASE_URL` esta correto
- Porta 8000 esta disponivel

### Erros de autenticacao

Verificar:
- API key esta configurada no `.env`
- Header `X-API-Key` esta presente
- Key esta correta

### Queries lentas

Verificar:
- Indices do banco estao criados
- Cache de embeddings esta ativo
- Modelo LLM nao esta muito grande

---

## Proximos Passos

1. Implementar rate limiting
2. Adicionar autenticacao OAuth2
3. Criar dashboard de monitoramento
4. Publicar SDK no PyPI
5. Adicionar webhooks para eventos

---

## Recursos

- [Documentacao FastAPI](https://fastapi.tiangolo.com/)
- [Documentacao Uvicorn](https://www.uvicorn.org/)
- [OpenAPI/Swagger](http://localhost:8000/api/docs)
- [ReDoc](http://localhost:8000/api/redoc)