# Guia da API REST - RAG Template API REST completa para o RAG Template usando FastAPI. --- ## Visao Geral A API REST permite integracao programatica com o sistema RAG, oferecendo endpoints para: - Ingestao de documentos (texto ou upload de arquivos) - Queries RAG - Gerenciamento de documentos - Estatisticas do sistema - Health checks **Base URL**: `http://localhost:8000/api/v1` **Documentacao Interativa**: `http://localhost:8000/api/docs` --- ## Autenticacao Todos os endpoints (exceto `/health`) requerem autenticacao via API key. ### Configurar API Keys No arquivo `.env`: ```bash API_KEYS=key1,key2,key3 ``` ### Usar API Key Inclua header em todas as requisicoes: ``` X-API-Key: sua_api_key_aqui ``` --- ## Iniciar Servidor ### Modo Desenvolvimento ```bash python api_server.py ``` ### Modo Producao ```bash uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4 ``` ### Com Docker ```bash docker run -p 8000:8000 -e DATABASE_URL=... -e API_KEYS=... rag-template ``` --- ## Endpoints ### GET /api/v1/health Health check do sistema. **Autenticacao**: Nao requerida **Response**: ```json { "status": "healthy", "timestamp": "2026-01-23T10:30:00", "database": "healthy", "embeddings": "healthy", "version": "1.6.0" } ``` ### POST /api/v1/ingest Ingere texto no sistema. **Request Body**: ```json { "text": "Conteudo do documento...", "title": "Titulo do Documento", "chunk_size": 1000, "chunk_overlap": 200, "strategy": "recursive", "metadata": { "document_type": "TXT", "tags": ["tech", "ai"], "security_level": "public" } } ``` **Response**: ```json { "document_id": 123, "num_chunks": 15, "message": "Document ingested successfully", "metadata": {...} } ``` ### POST /api/v1/upload Upload e ingere arquivo (PDF ou TXT). **Request**: `multipart/form-data` - `file`: Arquivo a fazer upload - `chunk_size`: (opcional) Tamanho dos chunks - `chunk_overlap`: (opcional) Overlap entre chunks - `strategy`: (opcional) Estrategia de chunking **Response**: Similar ao `/ingest` ### POST /api/v1/query Executa query RAG. **Request Body**: ```json { "query": "O que e RAG?", "top_k": 5, "temperature": 0.3, "max_tokens": 512, "model": "huggingface", "filters": { "document_type": "PDF", "tags": ["tech"] } } ``` **Response**: ```json { "query": "O que e RAG?", "response": "RAG e Retrieval-Augmented Generation...", "contexts": [ { "content": "Contexto relevante...", "similarity": 0.92, "document_id": 123 } ], "metadata": { "num_contexts": 5, "model": "huggingface", "temperature": 0.3, "max_tokens": 512 } } ``` ### GET /api/v1/documents Lista documentos no sistema. **Query Parameters**: - `limit`: (opcional) Numero maximo de documentos (default: 100) - `offset`: (opcional) Offset para paginacao (default: 0) - `session_id`: (opcional) Filtrar por session_id **Response**: ```json [ { "id": 123, "title": "Documento 1", "content": "Conteudo...", "chunk_count": 15, "created_at": "2026-01-23T10:30:00", "metadata": {...} } ] ``` ### DELETE /api/v1/documents/{document_id} Deleta documento do sistema. **Path Parameters**: - `document_id`: ID do documento **Response**: ```json { "message": "Document deleted successfully", "document_id": 123 } ``` ### GET /api/v1/stats Retorna estatisticas do sistema. **Response**: ```json { "database": { "total_documents": 150, "total_chunks": 2500, "avg_chunks_per_doc": 16.67 }, "metadata": { "total": 150, "by_type": {"PDF": 100, "TXT": 50}, "by_security": {"public": 120, "internal": 30} }, "timestamp": "2026-01-23T10:30:00" } ``` --- ## Usando Python SDK ### Instalacao ```bash pip install -e . # Instalar localmente ``` ### Uso Basico ```python from sdk import RAGClient # Criar cliente client = RAGClient( base_url="http://localhost:8000", api_key="sua_api_key" ) # Health check health = client.health_check() print(health) # Ingerir texto result = client.ingest_text( text="Conteudo do documento...", title="Meu Documento", metadata={"tags": ["tech", "ai"]} ) print(f"Document ID: {result['document_id']}") # Upload arquivo result = client.upload_file("documento.pdf") print(f"Chunks: {result['num_chunks']}") # Query response = client.query( query="O que e RAG?", top_k=5, filters={"tags": ["tech"]} ) print(response['response']) # Listar documentos docs = client.list_documents(limit=10) for doc in docs: print(f"{doc['id']}: {doc['title']}") # Deletar documento client.delete_document(123) # Estatisticas stats = client.get_stats() print(stats) ``` --- ## Exemplos de Uso ### Exemplo 1: Pipeline de Ingestao ```python from sdk import RAGClient from pathlib import Path client = RAGClient(api_key="my_key") # Ingerir multiplos arquivos docs_dir = Path("./documents") for file in docs_dir.glob("*.pdf"): result = client.upload_file(str(file)) print(f"Ingested {file.name}: {result['num_chunks']} chunks") ``` ### Exemplo 2: Chatbot Simples ```python from sdk import RAGClient client = RAGClient(api_key="my_key") while True: query = input("Voce: ") if query.lower() in ["sair", "exit"]: break response = client.query(query, top_k=5) print(f"Bot: {response['response']}\n") ``` ### Exemplo 3: Busca Filtrada ```python from sdk import RAGClient client = RAGClient(api_key="my_key") # Buscar apenas em documentos publicos de tech response = client.query( query="Como funciona embedding?", filters={ "security_level": "public", "tags": ["tech", "ai"] } ) print(response['response']) print(f"Contextos usados: {response['metadata']['num_contexts']}") ``` --- ## Usando cURL ### Health Check ```bash curl http://localhost:8000/api/v1/health ``` ### Ingerir Texto ```bash curl -X POST http://localhost:8000/api/v1/ingest \ -H "Content-Type: application/json" \ -H "X-API-Key: sua_key" \ -d '{ "text": "Conteudo do documento", "title": "Titulo" }' ``` ### Query ```bash curl -X POST http://localhost:8000/api/v1/query \ -H "Content-Type: application/json" \ -H "X-API-Key: sua_key" \ -d '{ "query": "O que e RAG?", "top_k": 5 }' ``` ### Listar Documentos ```bash curl http://localhost:8000/api/v1/documents?limit=10 \ -H "X-API-Key: sua_key" ``` --- ## Rate Limiting A API nao implementa rate limiting por padrao. Para producao, considere usar: - **Nginx**: Com `limit_req_zone` - **Traefik**: Com middleware de rate limiting - **CloudFlare**: Rate limiting no CDN --- ## Erros ### Codigos de Status - `200`: Sucesso - `400`: Bad Request (parametros invalidos) - `401`: Unauthorized (API key invalida ou ausente) - `404`: Not Found (recurso nao encontrado) - `500`: Internal Server Error ### Formato de Erro ```json { "detail": "Error message here" } ``` --- ## Performance ### Benchmarks Testes em maquina local (M1 Pro, 16GB RAM): | Endpoint | Tempo Medio | Notas | |----------|-------------|-------| | /health | <10ms | Muito rapido | | /ingest | 500-2000ms | Depende do tamanho do documento | | /query | 200-1000ms | Depende do LLM escolhido | | /documents | <100ms | Paginado | ### Otimizacoes 1. **Cache de Embeddings**: Ativado automaticamente 2. **Connection Pooling**: Usar pgBouncer ou Supabase 3. **Workers**: Multiplos workers Uvicorn para producao 4. **Async**: Endpoints sao async por padrao --- ## Deploy em Producao ### Docker Compose ```yaml version: '3.8' services: api: build: . ports: - "8000:8000" environment: - DATABASE_URL=postgresql://... - HF_TOKEN=... - API_KEYS=key1,key2 command: uvicorn src.api:app --host 0.0.0.0 --port 8000 --workers 4 ``` ### Variavies de Ambiente ```bash # API Config API_HOST=0.0.0.0 API_PORT=8000 API_WORKERS=4 API_RELOAD=false API_KEYS=key1,key2,key3 # Database DATABASE_URL=postgresql://... # LLM HF_TOKEN=... ``` --- ## Seguranca ### Best Practices 1. **HTTPS**: Sempre use HTTPS em producao 2. **API Keys**: Gere keys fortes e rotacione regularmente 3. **Rate Limiting**: Implemente rate limiting 4. **CORS**: Configure CORS apropriadamente 5. **Input Validation**: Validacao automatica via Pydantic 6. **Logs**: Monitore logs de acesso --- ## Troubleshooting ### API nao inicia Verificar: - PostgreSQL esta rodando - `DATABASE_URL` esta correto - Porta 8000 esta disponivel ### Erros de autenticacao Verificar: - API key esta configurada no `.env` - Header `X-API-Key` esta presente - Key esta correta ### Queries lentas Verificar: - Indices do banco estao criados - Cache de embeddings esta ativo - Modelo LLM nao esta muito grande --- ## Proximos Passos 1. Implementar rate limiting 2. Adicionar autenticacao OAuth2 3. Criar dashboard de monitoramento 4. Publicar SDK no PyPI 5. Adicionar webhooks para eventos --- ## Recursos - [Documentacao FastAPI](https://fastapi.tiangolo.com/) - [Documentacao Uvicorn](https://www.uvicorn.org/) - [OpenAPI/Swagger](http://localhost:8000/api/docs) - [ReDoc](http://localhost:8000/api/redoc)