rag_template / docs /API_GUIDE.md
Guilherme Favaron
Sync: Complete project update (Phase 6) - API, Metadata, Eval, Docs
a686b1b
# Guia da API REST - RAG Template
API REST completa para o RAG Template usando FastAPI.
---
## Visao Geral
A API REST permite integracao programatica com o sistema RAG, oferecendo endpoints para:
- Ingestao de documentos (texto ou upload de arquivos)
- Queries RAG
- Gerenciamento de documentos
- Estatisticas do sistema
- Health checks
**Base URL**: `http://localhost:8000/api/v1`
**Documentacao Interativa**: `http://localhost:8000/api/docs`
---
## Autenticacao
Todos os endpoints (exceto `/health`) requerem autenticacao via API key.
### Configurar API Keys
No arquivo `.env`:
```bash
API_KEYS=key1,key2,key3
```
### Usar API Key
Inclua header em todas as requisicoes:
```
X-API-Key: sua_api_key_aqui
```
---
## Iniciar Servidor
### Modo Desenvolvimento
```bash
python api_server.py
```
### Modo Producao
```bash
uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4
```
### Com Docker
```bash
docker run -p 8000:8000 -e DATABASE_URL=... -e API_KEYS=... rag-template
```
---
## Endpoints
### GET /api/v1/health
Health check do sistema.
**Autenticacao**: Nao requerida
**Response**:
```json
{
"status": "healthy",
"timestamp": "2026-01-23T10:30:00",
"database": "healthy",
"embeddings": "healthy",
"version": "1.6.0"
}
```
### POST /api/v1/ingest
Ingere texto no sistema.
**Request Body**:
```json
{
"text": "Conteudo do documento...",
"title": "Titulo do Documento",
"chunk_size": 1000,
"chunk_overlap": 200,
"strategy": "recursive",
"metadata": {
"document_type": "TXT",
"tags": ["tech", "ai"],
"security_level": "public"
}
}
```
**Response**:
```json
{
"document_id": 123,
"num_chunks": 15,
"message": "Document ingested successfully",
"metadata": {...}
}
```
### POST /api/v1/upload
Upload e ingere arquivo (PDF ou TXT).
**Request**: `multipart/form-data`
- `file`: Arquivo a fazer upload
- `chunk_size`: (opcional) Tamanho dos chunks
- `chunk_overlap`: (opcional) Overlap entre chunks
- `strategy`: (opcional) Estrategia de chunking
**Response**: Similar ao `/ingest`
### POST /api/v1/query
Executa query RAG.
**Request Body**:
```json
{
"query": "O que e RAG?",
"top_k": 5,
"temperature": 0.3,
"max_tokens": 512,
"model": "huggingface",
"filters": {
"document_type": "PDF",
"tags": ["tech"]
}
}
```
**Response**:
```json
{
"query": "O que e RAG?",
"response": "RAG e Retrieval-Augmented Generation...",
"contexts": [
{
"content": "Contexto relevante...",
"similarity": 0.92,
"document_id": 123
}
],
"metadata": {
"num_contexts": 5,
"model": "huggingface",
"temperature": 0.3,
"max_tokens": 512
}
}
```
### GET /api/v1/documents
Lista documentos no sistema.
**Query Parameters**:
- `limit`: (opcional) Numero maximo de documentos (default: 100)
- `offset`: (opcional) Offset para paginacao (default: 0)
- `session_id`: (opcional) Filtrar por session_id
**Response**:
```json
[
{
"id": 123,
"title": "Documento 1",
"content": "Conteudo...",
"chunk_count": 15,
"created_at": "2026-01-23T10:30:00",
"metadata": {...}
}
]
```
### DELETE /api/v1/documents/{document_id}
Deleta documento do sistema.
**Path Parameters**:
- `document_id`: ID do documento
**Response**:
```json
{
"message": "Document deleted successfully",
"document_id": 123
}
```
### GET /api/v1/stats
Retorna estatisticas do sistema.
**Response**:
```json
{
"database": {
"total_documents": 150,
"total_chunks": 2500,
"avg_chunks_per_doc": 16.67
},
"metadata": {
"total": 150,
"by_type": {"PDF": 100, "TXT": 50},
"by_security": {"public": 120, "internal": 30}
},
"timestamp": "2026-01-23T10:30:00"
}
```
---
## Usando Python SDK
### Instalacao
```bash
pip install -e . # Instalar localmente
```
### Uso Basico
```python
from sdk import RAGClient
# Criar cliente
client = RAGClient(
base_url="http://localhost:8000",
api_key="sua_api_key"
)
# Health check
health = client.health_check()
print(health)
# Ingerir texto
result = client.ingest_text(
text="Conteudo do documento...",
title="Meu Documento",
metadata={"tags": ["tech", "ai"]}
)
print(f"Document ID: {result['document_id']}")
# Upload arquivo
result = client.upload_file("documento.pdf")
print(f"Chunks: {result['num_chunks']}")
# Query
response = client.query(
query="O que e RAG?",
top_k=5,
filters={"tags": ["tech"]}
)
print(response['response'])
# Listar documentos
docs = client.list_documents(limit=10)
for doc in docs:
print(f"{doc['id']}: {doc['title']}")
# Deletar documento
client.delete_document(123)
# Estatisticas
stats = client.get_stats()
print(stats)
```
---
## Exemplos de Uso
### Exemplo 1: Pipeline de Ingestao
```python
from sdk import RAGClient
from pathlib import Path
client = RAGClient(api_key="my_key")
# Ingerir multiplos arquivos
docs_dir = Path("./documents")
for file in docs_dir.glob("*.pdf"):
result = client.upload_file(str(file))
print(f"Ingested {file.name}: {result['num_chunks']} chunks")
```
### Exemplo 2: Chatbot Simples
```python
from sdk import RAGClient
client = RAGClient(api_key="my_key")
while True:
query = input("Voce: ")
if query.lower() in ["sair", "exit"]:
break
response = client.query(query, top_k=5)
print(f"Bot: {response['response']}\n")
```
### Exemplo 3: Busca Filtrada
```python
from sdk import RAGClient
client = RAGClient(api_key="my_key")
# Buscar apenas em documentos publicos de tech
response = client.query(
query="Como funciona embedding?",
filters={
"security_level": "public",
"tags": ["tech", "ai"]
}
)
print(response['response'])
print(f"Contextos usados: {response['metadata']['num_contexts']}")
```
---
## Usando cURL
### Health Check
```bash
curl http://localhost:8000/api/v1/health
```
### Ingerir Texto
```bash
curl -X POST http://localhost:8000/api/v1/ingest \
-H "Content-Type: application/json" \
-H "X-API-Key: sua_key" \
-d '{
"text": "Conteudo do documento",
"title": "Titulo"
}'
```
### Query
```bash
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-H "X-API-Key: sua_key" \
-d '{
"query": "O que e RAG?",
"top_k": 5
}'
```
### Listar Documentos
```bash
curl http://localhost:8000/api/v1/documents?limit=10 \
-H "X-API-Key: sua_key"
```
---
## Rate Limiting
A API nao implementa rate limiting por padrao. Para producao, considere usar:
- **Nginx**: Com `limit_req_zone`
- **Traefik**: Com middleware de rate limiting
- **CloudFlare**: Rate limiting no CDN
---
## Erros
### Codigos de Status
- `200`: Sucesso
- `400`: Bad Request (parametros invalidos)
- `401`: Unauthorized (API key invalida ou ausente)
- `404`: Not Found (recurso nao encontrado)
- `500`: Internal Server Error
### Formato de Erro
```json
{
"detail": "Error message here"
}
```
---
## Performance
### Benchmarks
Testes em maquina local (M1 Pro, 16GB RAM):
| Endpoint | Tempo Medio | Notas |
|----------|-------------|-------|
| /health | <10ms | Muito rapido |
| /ingest | 500-2000ms | Depende do tamanho do documento |
| /query | 200-1000ms | Depende do LLM escolhido |
| /documents | <100ms | Paginado |
### Otimizacoes
1. **Cache de Embeddings**: Ativado automaticamente
2. **Connection Pooling**: Usar pgBouncer ou Supabase
3. **Workers**: Multiplos workers Uvicorn para producao
4. **Async**: Endpoints sao async por padrao
---
## Deploy em Producao
### Docker Compose
```yaml
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://...
- HF_TOKEN=...
- API_KEYS=key1,key2
command: uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4
```
### Variavies de Ambiente
```bash
# API Config
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=4
API_RELOAD=false
API_KEYS=key1,key2,key3
# Database
DATABASE_URL=postgresql://...
# LLM
HF_TOKEN=...
```
---
## Seguranca
### Best Practices
1. **HTTPS**: Sempre use HTTPS em producao
2. **API Keys**: Gere keys fortes e rotacione regularmente
3. **Rate Limiting**: Implemente rate limiting
4. **CORS**: Configure CORS apropriadamente
5. **Input Validation**: Validacao automatica via Pydantic
6. **Logs**: Monitore logs de acesso
---
## Troubleshooting
### API nao inicia
Verificar:
- PostgreSQL esta rodando
- `DATABASE_URL` esta correto
- Porta 8000 esta disponivel
### Erros de autenticacao
Verificar:
- API key esta configurada no `.env`
- Header `X-API-Key` esta presente
- Key esta correta
### Queries lentas
Verificar:
- Indices do banco estao criados
- Cache de embeddings esta ativo
- Modelo LLM nao esta muito grande
---
## Proximos Passos
1. Implementar rate limiting
2. Adicionar autenticacao OAuth2
3. Criar dashboard de monitoramento
4. Publicar SDK no PyPI
5. Adicionar webhooks para eventos
---
## Recursos
- [Documentacao FastAPI](https://fastapi.tiangolo.com/)
- [Documentacao Uvicorn](https://www.uvicorn.org/)
- [OpenAPI/Swagger](http://localhost:8000/api/docs)
- [ReDoc](http://localhost:8000/api/redoc)