Spaces:

guifav
/

rag_template

Sleeping

App Files Files Community

rag_template / docs /API_GUIDE.md

Guilherme Favaron

Sync: Complete project update (Phase 6) - API, Metadata, Eval, Docs

a686b1b 6 days ago

preview code

raw

history blame contribute delete

9.13 kB

	# Guia da API REST - RAG Template

	API REST completa para o RAG Template usando FastAPI.

	---

	## Visao Geral

	A API REST permite integracao programatica com o sistema RAG, oferecendo endpoints para:
	- Ingestao de documentos (texto ou upload de arquivos)
	- Queries RAG
	- Gerenciamento de documentos
	- Estatisticas do sistema
	- Health checks

	Base URL: `http://localhost:8000/api/v1`

	Documentacao Interativa: `http://localhost:8000/api/docs`

	---

	## Autenticacao

	Todos os endpoints (exceto `/health`) requerem autenticacao via API key.

	### Configurar API Keys

	No arquivo `.env`:

	```bash
	API_KEYS=key1,key2,key3
	```

	### Usar API Key

	Inclua header em todas as requisicoes:

	```
	X-API-Key: sua_api_key_aqui
	```

	---

	## Iniciar Servidor

	### Modo Desenvolvimento

	```bash
	python api_server.py
	```

	### Modo Producao

	```bash
	uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4
	```

	### Com Docker

	```bash
	docker run -p 8000:8000 -e DATABASE_URL=... -e API_KEYS=... rag-template
	```

	---

	## Endpoints

	### GET /api/v1/health

	Health check do sistema.

	Autenticacao: Nao requerida

	Response:
	```json
	{
	"status": "healthy",
	"timestamp": "2026-01-23T10:30:00",
	"database": "healthy",
	"embeddings": "healthy",
	"version": "1.6.0"
	}
	```

	### POST /api/v1/ingest

	Ingere texto no sistema.

	Request Body:
	```json
	{
	"text": "Conteudo do documento...",
	"title": "Titulo do Documento",
	"chunk_size": 1000,
	"chunk_overlap": 200,
	"strategy": "recursive",
	"metadata": {
	"document_type": "TXT",
	"tags": ["tech", "ai"],
	"security_level": "public"
	}
	}
	```

	Response:
	```json
	{
	"document_id": 123,
	"num_chunks": 15,
	"message": "Document ingested successfully",
	"metadata": {...}
	}
	```

	### POST /api/v1/upload

	Upload e ingere arquivo (PDF ou TXT).

	Request: `multipart/form-data`
	- `file`: Arquivo a fazer upload
	- `chunk_size`: (opcional) Tamanho dos chunks
	- `chunk_overlap`: (opcional) Overlap entre chunks
	- `strategy`: (opcional) Estrategia de chunking

	Response: Similar ao `/ingest`

	### POST /api/v1/query

	Executa query RAG.

	Request Body:
	```json
	{
	"query": "O que e RAG?",
	"top_k": 5,
	"temperature": 0.3,
	"max_tokens": 512,
	"model": "huggingface",
	"filters": {
	"document_type": "PDF",
	"tags": ["tech"]
	}
	}
	```

	Response:
	```json
	{
	"query": "O que e RAG?",
	"response": "RAG e Retrieval-Augmented Generation...",
	"contexts": [
	{
	"content": "Contexto relevante...",
	"similarity": 0.92,
	"document_id": 123
	}
	],
	"metadata": {
	"num_contexts": 5,
	"model": "huggingface",
	"temperature": 0.3,
	"max_tokens": 512
	}
	}
	```

	### GET /api/v1/documents

	Lista documentos no sistema.

	Query Parameters:
	- `limit`: (opcional) Numero maximo de documentos (default: 100)
	- `offset`: (opcional) Offset para paginacao (default: 0)
	- `session_id`: (opcional) Filtrar por session_id

	Response:
	```json
	[
	{
	"id": 123,
	"title": "Documento 1",
	"content": "Conteudo...",
	"chunk_count": 15,
	"created_at": "2026-01-23T10:30:00",
	"metadata": {...}
	}
	]
	```

	### DELETE /api/v1/documents/{document_id}

	Deleta documento do sistema.

	Path Parameters:
	- `document_id`: ID do documento

	Response:
	```json
	{
	"message": "Document deleted successfully",
	"document_id": 123
	}
	```

	### GET /api/v1/stats

	Retorna estatisticas do sistema.

	Response:
	```json
	{
	"database": {
	"total_documents": 150,
	"total_chunks": 2500,
	"avg_chunks_per_doc": 16.67
	},
	"metadata": {
	"total": 150,
	"by_type": {"PDF": 100, "TXT": 50},
	"by_security": {"public": 120, "internal": 30}
	},
	"timestamp": "2026-01-23T10:30:00"
	}
	```

	---

	## Usando Python SDK

	### Instalacao

	```bash
	pip install -e . # Instalar localmente
	```

	### Uso Basico

	```python
	from sdk import RAGClient

	# Criar cliente
	client = RAGClient(
	base_url="http://localhost:8000",
	api_key="sua_api_key"
	)

	# Health check
	health = client.health_check()
	print(health)

	# Ingerir texto
	result = client.ingest_text(
	text="Conteudo do documento...",
	title="Meu Documento",
	metadata={"tags": ["tech", "ai"]}
	)
	print(f"Document ID: {result['document_id']}")

	# Upload arquivo
	result = client.upload_file("documento.pdf")
	print(f"Chunks: {result['num_chunks']}")

	# Query
	response = client.query(
	query="O que e RAG?",
	top_k=5,
	filters={"tags": ["tech"]}
	)
	print(response['response'])

	# Listar documentos
	docs = client.list_documents(limit=10)
	for doc in docs:
	print(f"{doc['id']}: {doc['title']}")

	# Deletar documento
	client.delete_document(123)

	# Estatisticas
	stats = client.get_stats()
	print(stats)
	```

	---

	## Exemplos de Uso

	### Exemplo 1: Pipeline de Ingestao

	```python
	from sdk import RAGClient
	from pathlib import Path

	client = RAGClient(api_key="my_key")

	# Ingerir multiplos arquivos
	docs_dir = Path("./documents")
	for file in docs_dir.glob("*.pdf"):
	result = client.upload_file(str(file))
	print(f"Ingested {file.name}: {result['num_chunks']} chunks")
	```

	### Exemplo 2: Chatbot Simples

	```python
	from sdk import RAGClient

	client = RAGClient(api_key="my_key")

	while True:
	query = input("Voce: ")
	if query.lower() in ["sair", "exit"]:
	break

	response = client.query(query, top_k=5)
	print(f"Bot: {response['response']}\n")
	```

	### Exemplo 3: Busca Filtrada

	```python
	from sdk import RAGClient

	client = RAGClient(api_key="my_key")

	# Buscar apenas em documentos publicos de tech
	response = client.query(
	query="Como funciona embedding?",
	filters={
	"security_level": "public",
	"tags": ["tech", "ai"]
	}
	)

	print(response['response'])
	print(f"Contextos usados: {response['metadata']['num_contexts']}")
	```

	---

	## Usando cURL

	### Health Check

	```bash
	curl http://localhost:8000/api/v1/health
	```

	### Ingerir Texto

	```bash
	curl -X POST http://localhost:8000/api/v1/ingest \
	-H "Content-Type: application/json" \
	-H "X-API-Key: sua_key" \
	-d '{
	"text": "Conteudo do documento",
	"title": "Titulo"
	}'
	```

	### Query

	```bash
	curl -X POST http://localhost:8000/api/v1/query \
	-H "Content-Type: application/json" \
	-H "X-API-Key: sua_key" \
	-d '{
	"query": "O que e RAG?",
	"top_k": 5
	}'
	```

	### Listar Documentos

	```bash
	curl http://localhost:8000/api/v1/documents?limit=10 \
	-H "X-API-Key: sua_key"
	```

	---

	## Rate Limiting

	A API nao implementa rate limiting por padrao. Para producao, considere usar:

	- Nginx: Com `limit_req_zone`
	- Traefik: Com middleware de rate limiting
	- CloudFlare: Rate limiting no CDN

	---

	## Erros

	### Codigos de Status

	- `200`: Sucesso
	- `400`: Bad Request (parametros invalidos)
	- `401`: Unauthorized (API key invalida ou ausente)
	- `404`: Not Found (recurso nao encontrado)
	- `500`: Internal Server Error

	### Formato de Erro

	```json
	{
	"detail": "Error message here"
	}
	```

	---

	## Performance

	### Benchmarks

	Testes em maquina local (M1 Pro, 16GB RAM):

	\| Endpoint \| Tempo Medio \| Notas \|
	\|----------\|-------------\|-------\|
	\| /health \| <10ms \| Muito rapido \|
	\| /ingest \| 500-2000ms \| Depende do tamanho do documento \|
	\| /query \| 200-1000ms \| Depende do LLM escolhido \|
	\| /documents \| <100ms \| Paginado \|

	### Otimizacoes

	1. Cache de Embeddings: Ativado automaticamente
	2. Connection Pooling: Usar pgBouncer ou Supabase
	3. Workers: Multiplos workers Uvicorn para producao
	4. Async: Endpoints sao async por padrao

	---

	## Deploy em Producao

	### Docker Compose

	```yaml
	version: '3.8'
	services:
	api:
	build: .
	ports:
	- "8000:8000"
	environment:
	- DATABASE_URL=postgresql://...
	- HF_TOKEN=...
	- API_KEYS=key1,key2
	command: uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4
	```

	### Variavies de Ambiente

	```bash
	# API Config
	API_HOST=0.0.0.0
	API_PORT=8000
	API_WORKERS=4
	API_RELOAD=false
	API_KEYS=key1,key2,key3

	# Database
	DATABASE_URL=postgresql://...

	# LLM
	HF_TOKEN=...
	```

	---

	## Seguranca

	### Best Practices

	1. HTTPS: Sempre use HTTPS em producao
	2. API Keys: Gere keys fortes e rotacione regularmente
	3. Rate Limiting: Implemente rate limiting
	4. CORS: Configure CORS apropriadamente
	5. Input Validation: Validacao automatica via Pydantic
	6. Logs: Monitore logs de acesso

	---

	## Troubleshooting

	### API nao inicia

	Verificar:
	- PostgreSQL esta rodando
	- `DATABASE_URL` esta correto
	- Porta 8000 esta disponivel

	### Erros de autenticacao

	Verificar:
	- API key esta configurada no `.env`
	- Header `X-API-Key` esta presente
	- Key esta correta

	### Queries lentas

	Verificar:
	- Indices do banco estao criados
	- Cache de embeddings esta ativo
	- Modelo LLM nao esta muito grande

	---

	## Proximos Passos

	1. Implementar rate limiting
	2. Adicionar autenticacao OAuth2
	3. Criar dashboard de monitoramento
	4. Publicar SDK no PyPI
	5. Adicionar webhooks para eventos

	---

	## Recursos

	- [Documentacao FastAPI](https://fastapi.tiangolo.com/)
	- [Documentacao Uvicorn](https://www.uvicorn.org/)
	- [OpenAPI/Swagger](http://localhost:8000/api/docs)
	- [ReDoc](http://localhost:8000/api/redoc)