# Text Summarizer Backend - Development Plan

## Overview
A minimal FastAPI backend for text summarization using local Ollama, designed to be callable from an Android app and extensible for cloud hosting.

## Architecture Goals
- **Local-first**: Use Ollama running locally for privacy and cost control
- **Cloud-ready**: Structure code to easily deploy to cloud later
- **Minimal v1**: Focus on core summarization functionality
- **Android-friendly**: RESTful API optimized for mobile app consumption

## Technology Stack
- **Backend**: FastAPI + Python
- **LLM**: Ollama (local)
- **Server**: Uvicorn
- **Validation**: Pydantic
- **Testing**: Pytest + pytest-asyncio + httpx (for async testing)
- **Containerization**: Docker (for cloud deployment)

## Project Structure
```
app/
├── main.py                 # FastAPI app entry point
├── api/
│   └── v1/
│       ├── routes.py       # API route definitions
│       └── schemas.py      # Pydantic models
├── services/
│   └── summarizer.py       # Ollama integration
├── core/
│   ├── config.py          # Configuration management
│   └── logging.py         # Logging setup
tests/
├── test_api.py            # API endpoint tests
├── test_services.py       # Service layer tests
├── test_schemas.py        # Pydantic model tests
├── test_config.py         # Configuration tests
└── conftest.py           # Test configuration and fixtures
requirements.txt
Dockerfile
docker-compose.yml
README.md
```

## API Contract (v1)

### POST /api/v1/summarize
**Request:**
```json
{
  "text": "string (required)",
  "max_tokens": 256,
  "prompt": "Summarize concisely."
}
```

**Response:**
```json
{
  "summary": "string",
  "model": "llama3.1:8b",
  "tokens_used": 512,
  "latency_ms": 1234
}
```

### GET /health
**Response:**
```json
{
  "status": "ok",
  "ollama": "reachable"
}
```

## Development Phases

### Phase 1: Foundation
- [ ] Project scaffold and directory structure
- [ ] Core dependencies and requirements.txt (including test dependencies)
- [ ] Basic FastAPI app setup
- [ ] Configuration management with environment variables
- [ ] Logging setup
- [ ] Health check endpoint
- [ ] Basic test setup and configuration

### Phase 2: Core Feature
- [ ] Pydantic schemas for request/response
- [ ] Unit tests for schemas (validation, serialization)
- [ ] Ollama service integration
- [ ] Unit tests for Ollama service (mocked)
- [ ] Summarization endpoint implementation
- [ ] Integration tests for API endpoints
- [ ] Input validation and error handling
- [ ] Basic request/response logging

### Phase 3: Quality & DX
- [ ] Error handling middleware
- [ ] Request ID middleware
- [ ] Input size limits and validation
- [ ] Rate limiting (optional for v1)
- [ ] Test coverage analysis and improvement
- [ ] Performance tests for summarization endpoint

### Phase 4: Cloud-Ready Structure
- [ ] Dockerfile for containerization
- [ ] docker-compose.yml for local development
- [ ] Environment-based configuration
- [ ] CORS configuration for Android app
- [ ] Security headers and API key support (optional)
- [ ] Metrics endpoint (optional)

### Phase 5: Documentation & Examples
- [ ] Comprehensive README with setup instructions
- [ ] API documentation (FastAPI auto-docs)
- [ ] Example curl commands
- [ ] Android client integration examples
- [ ] Deployment guide for cloud hosting

## Configuration

### Environment Variables
```bash
# Ollama Configuration
OLLAMA_MODEL=llama3.1:8b
OLLAMA_HOST=http://127.0.0.1:11434
OLLAMA_TIMEOUT=30

# Server Configuration
SERVER_HOST=127.0.0.1
SERVER_PORT=8000
LOG_LEVEL=INFO

# Optional: API Security
API_KEY_ENABLED=false
API_KEY=your-secret-key

# Optional: Rate Limiting
RATE_LIMIT_ENABLED=false
RATE_LIMIT_REQUESTS=60
RATE_LIMIT_WINDOW=60
```

## Local Development Setup

### Prerequisites
1. Install Ollama:
   ```bash
   # macOS
   brew install ollama
   
   # Or download from https://ollama.ai
   ```

2. Start Ollama service:
   ```bash
   ollama serve
   ```

3. Pull a model:
   ```bash
   ollama pull llama3.1:8b
   # or
   ollama pull mistral
   ```

### Running the API
```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export OLLAMA_MODEL=llama3.1:8b

# Run the server
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload
```

### Testing the API
```bash
# Health check
curl http://127.0.0.1:8000/health

# Summarize text
curl -X POST http://127.0.0.1:8000/api/v1/summarize \
  -H "Content-Type: application/json" \
  -d '{"text": "Your long text to summarize here..."}'
```

### Running Tests
```bash
# Run all tests
pytest

# Run tests with coverage
pytest --cov=app --cov-report=html --cov-report=term

# Run specific test file
pytest tests/test_api.py

# Run tests with verbose output
pytest -v

# Run tests and stop on first failure
pytest -x
```

## Testing Strategy

### Test Types
1. **Unit Tests**
   - Pydantic model validation
   - Service layer logic (with mocked Ollama)
   - Configuration loading
   - Utility functions

2. **Integration Tests**
   - API endpoint testing with TestClient
   - End-to-end summarization flow
   - Error handling scenarios
   - Health check functionality

3. **Mock Strategy**
   - Mock Ollama HTTP calls using `httpx` or `responses`
   - Mock external dependencies
   - Use fixtures for common test data

### Test Coverage Goals
- **Minimum 90% code coverage**
- **100% coverage for critical paths** (API endpoints, error handling)
- **All edge cases tested** (empty input, large input, network failures)

### Test Data
```python
# Example test fixtures
SAMPLE_TEXT = "This is a long text that needs to be summarized..."
SAMPLE_SUMMARY = "This text discusses summarization."
MOCK_OLLAMA_RESPONSE = {
    "model": "llama3.1:8b",
    "response": SAMPLE_SUMMARY,
    "done": True
}
```

### Continuous Testing
- Tests run on every code change
- Pre-commit hooks for test execution
- CI/CD pipeline integration ready

## Android Integration

### Example Android HTTP Client
```kotlin
// Using Retrofit or OkHttp
data class SummarizeRequest(
    val text: String,
    val max_tokens: Int = 256,
    val prompt: String = "Summarize concisely."
)

data class SummarizeResponse(
    val summary: String,
    val model: String,
    val tokens_used: Int,
    val latency_ms: Int
)

// API call
@POST("api/v1/summarize")
suspend fun summarize(@Body request: SummarizeRequest): SummarizeResponse
```

## Cloud Deployment Considerations

### Future Extensions
- **Authentication**: API key or OAuth2
- **Rate Limiting**: Redis-based distributed rate limiting
- **Monitoring**: Prometheus metrics, health checks
- **Scaling**: Multiple replicas, load balancing
- **Database**: Usage tracking, user management
- **Caching**: Redis for response caching
- **Security**: HTTPS, input sanitization, CORS policies

### Deployment Options
- **Docker**: Containerized deployment
- **Cloud Platforms**: AWS, GCP, Azure, Railway, Render
- **Serverless**: AWS Lambda, Vercel Functions (with Ollama API)
- **VPS**: DigitalOcean, Linode with Docker

## Success Criteria
- [ ] API responds to health checks
- [ ] Successfully summarizes text via Ollama
- [ ] Handles errors gracefully
- [ ] Works with Android app
- [ ] Can be containerized
- [ ] **All tests pass with >90% coverage**
- [ ] Documentation is complete

## Future Enhancements (Post-v1)
- [ ] Streaming responses
- [ ] Batch summarization
- [ ] Multiple model support
- [ ] Prompt templates and presets
- [ ] Usage analytics
- [ ] Multi-language support
- [ ] Advanced rate limiting
- [ ] User authentication and authorization