Spaces:
Running
Running
| # Text Summarizer Backend - Development Plan | |
| ## Overview | |
| A minimal FastAPI backend for text summarization using local Ollama, designed to be callable from an Android app and extensible for cloud hosting. | |
| ## Architecture Goals | |
| - **Local-first**: Use Ollama running locally for privacy and cost control | |
| - **Cloud-ready**: Structure code to easily deploy to cloud later | |
| - **Minimal v1**: Focus on core summarization functionality | |
| - **Android-friendly**: RESTful API optimized for mobile app consumption | |
| ## Technology Stack | |
| - **Backend**: FastAPI + Python | |
| - **LLM**: Ollama (local) | |
| - **Server**: Uvicorn | |
| - **Validation**: Pydantic | |
| - **Testing**: Pytest + pytest-asyncio + httpx (for async testing) | |
| - **Containerization**: Docker (for cloud deployment) | |
| ## Project Structure | |
| ``` | |
| app/ | |
| βββ main.py # FastAPI app entry point | |
| βββ api/ | |
| β βββ v1/ | |
| β βββ routes.py # API route definitions | |
| β βββ schemas.py # Pydantic models | |
| βββ services/ | |
| β βββ summarizer.py # Ollama integration | |
| βββ core/ | |
| β βββ config.py # Configuration management | |
| β βββ logging.py # Logging setup | |
| tests/ | |
| βββ test_api.py # API endpoint tests | |
| βββ test_services.py # Service layer tests | |
| βββ test_schemas.py # Pydantic model tests | |
| βββ test_config.py # Configuration tests | |
| βββ conftest.py # Test configuration and fixtures | |
| requirements.txt | |
| Dockerfile | |
| docker-compose.yml | |
| README.md | |
| ``` | |
| ## API Contract (v1) | |
| ### POST /api/v1/summarize | |
| **Request:** | |
| ```json | |
| { | |
| "text": "string (required)", | |
| "max_tokens": 256, | |
| "prompt": "Summarize concisely." | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "summary": "string", | |
| "model": "llama3.1:8b", | |
| "tokens_used": 512, | |
| "latency_ms": 1234 | |
| } | |
| ``` | |
| ### GET /health | |
| **Response:** | |
| ```json | |
| { | |
| "status": "ok", | |
| "ollama": "reachable" | |
| } | |
| ``` | |
| ## Development Phases | |
| ### Phase 1: Foundation | |
| - [ ] Project scaffold and directory structure | |
| - [ ] Core dependencies and requirements.txt (including test dependencies) | |
| - [ ] Basic FastAPI app setup | |
| - [ ] Configuration management with environment variables | |
| - [ ] Logging setup | |
| - [ ] Health check endpoint | |
| - [ ] Basic test setup and configuration | |
| ### Phase 2: Core Feature | |
| - [ ] Pydantic schemas for request/response | |
| - [ ] Unit tests for schemas (validation, serialization) | |
| - [ ] Ollama service integration | |
| - [ ] Unit tests for Ollama service (mocked) | |
| - [ ] Summarization endpoint implementation | |
| - [ ] Integration tests for API endpoints | |
| - [ ] Input validation and error handling | |
| - [ ] Basic request/response logging | |
| ### Phase 3: Quality & DX | |
| - [ ] Error handling middleware | |
| - [ ] Request ID middleware | |
| - [ ] Input size limits and validation | |
| - [ ] Rate limiting (optional for v1) | |
| - [ ] Test coverage analysis and improvement | |
| - [ ] Performance tests for summarization endpoint | |
| ### Phase 4: Cloud-Ready Structure | |
| - [ ] Dockerfile for containerization | |
| - [ ] docker-compose.yml for local development | |
| - [ ] Environment-based configuration | |
| - [ ] CORS configuration for Android app | |
| - [ ] Security headers and API key support (optional) | |
| - [ ] Metrics endpoint (optional) | |
| ### Phase 5: Documentation & Examples | |
| - [ ] Comprehensive README with setup instructions | |
| - [ ] API documentation (FastAPI auto-docs) | |
| - [ ] Example curl commands | |
| - [ ] Android client integration examples | |
| - [ ] Deployment guide for cloud hosting | |
| ## Configuration | |
| ### Environment Variables | |
| ```bash | |
| # Ollama Configuration | |
| OLLAMA_MODEL=llama3.1:8b | |
| OLLAMA_HOST=http://127.0.0.1:11434 | |
| OLLAMA_TIMEOUT=30 | |
| # Server Configuration | |
| SERVER_HOST=127.0.0.1 | |
| SERVER_PORT=8000 | |
| LOG_LEVEL=INFO | |
| # Optional: API Security | |
| API_KEY_ENABLED=false | |
| API_KEY=your-secret-key | |
| # Optional: Rate Limiting | |
| RATE_LIMIT_ENABLED=false | |
| RATE_LIMIT_REQUESTS=60 | |
| RATE_LIMIT_WINDOW=60 | |
| ``` | |
| ## Local Development Setup | |
| ### Prerequisites | |
| 1. Install Ollama: | |
| ```bash | |
| # macOS | |
| brew install ollama | |
| # Or download from https://ollama.ai | |
| ``` | |
| 2. Start Ollama service: | |
| ```bash | |
| ollama serve | |
| ``` | |
| 3. Pull a model: | |
| ```bash | |
| ollama pull llama3.1:8b | |
| # or | |
| ollama pull mistral | |
| ``` | |
| ### Running the API | |
| ```bash | |
| # Create virtual environment | |
| python -m venv .venv | |
| source .venv/bin/activate # On Windows: .venv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Set environment variables | |
| export OLLAMA_MODEL=llama3.1:8b | |
| # Run the server | |
| uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload | |
| ``` | |
| ### Testing the API | |
| ```bash | |
| # Health check | |
| curl http://127.0.0.1:8000/health | |
| # Summarize text | |
| curl -X POST http://127.0.0.1:8000/api/v1/summarize \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"text": "Your long text to summarize here..."}' | |
| ``` | |
| ### Running Tests | |
| ```bash | |
| # Run all tests | |
| pytest | |
| # Run tests with coverage | |
| pytest --cov=app --cov-report=html --cov-report=term | |
| # Run specific test file | |
| pytest tests/test_api.py | |
| # Run tests with verbose output | |
| pytest -v | |
| # Run tests and stop on first failure | |
| pytest -x | |
| ``` | |
| ## Testing Strategy | |
| ### Test Types | |
| 1. **Unit Tests** | |
| - Pydantic model validation | |
| - Service layer logic (with mocked Ollama) | |
| - Configuration loading | |
| - Utility functions | |
| 2. **Integration Tests** | |
| - API endpoint testing with TestClient | |
| - End-to-end summarization flow | |
| - Error handling scenarios | |
| - Health check functionality | |
| 3. **Mock Strategy** | |
| - Mock Ollama HTTP calls using `httpx` or `responses` | |
| - Mock external dependencies | |
| - Use fixtures for common test data | |
| ### Test Coverage Goals | |
| - **Minimum 90% code coverage** | |
| - **100% coverage for critical paths** (API endpoints, error handling) | |
| - **All edge cases tested** (empty input, large input, network failures) | |
| ### Test Data | |
| ```python | |
| # Example test fixtures | |
| SAMPLE_TEXT = "This is a long text that needs to be summarized..." | |
| SAMPLE_SUMMARY = "This text discusses summarization." | |
| MOCK_OLLAMA_RESPONSE = { | |
| "model": "llama3.1:8b", | |
| "response": SAMPLE_SUMMARY, | |
| "done": True | |
| } | |
| ``` | |
| ### Continuous Testing | |
| - Tests run on every code change | |
| - Pre-commit hooks for test execution | |
| - CI/CD pipeline integration ready | |
| ## Android Integration | |
| ### Example Android HTTP Client | |
| ```kotlin | |
| // Using Retrofit or OkHttp | |
| data class SummarizeRequest( | |
| val text: String, | |
| val max_tokens: Int = 256, | |
| val prompt: String = "Summarize concisely." | |
| ) | |
| data class SummarizeResponse( | |
| val summary: String, | |
| val model: String, | |
| val tokens_used: Int, | |
| val latency_ms: Int | |
| ) | |
| // API call | |
| @POST("api/v1/summarize") | |
| suspend fun summarize(@Body request: SummarizeRequest): SummarizeResponse | |
| ``` | |
| ## Cloud Deployment Considerations | |
| ### Future Extensions | |
| - **Authentication**: API key or OAuth2 | |
| - **Rate Limiting**: Redis-based distributed rate limiting | |
| - **Monitoring**: Prometheus metrics, health checks | |
| - **Scaling**: Multiple replicas, load balancing | |
| - **Database**: Usage tracking, user management | |
| - **Caching**: Redis for response caching | |
| - **Security**: HTTPS, input sanitization, CORS policies | |
| ### Deployment Options | |
| - **Docker**: Containerized deployment | |
| - **Cloud Platforms**: AWS, GCP, Azure, Railway, Render | |
| - **Serverless**: AWS Lambda, Vercel Functions (with Ollama API) | |
| - **VPS**: DigitalOcean, Linode with Docker | |
| ## Success Criteria | |
| - [ ] API responds to health checks | |
| - [ ] Successfully summarizes text via Ollama | |
| - [ ] Handles errors gracefully | |
| - [ ] Works with Android app | |
| - [ ] Can be containerized | |
| - [ ] **All tests pass with >90% coverage** | |
| - [ ] Documentation is complete | |
| ## Future Enhancements (Post-v1) | |
| - [ ] Streaming responses | |
| - [ ] Batch summarization | |
| - [ ] Multiple model support | |
| - [ ] Prompt templates and presets | |
| - [ ] Usage analytics | |
| - [ ] Multi-language support | |
| - [ ] Advanced rate limiting | |
| - [ ] User authentication and authorization | |