File size: 7,783 Bytes
9024ad9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
# Text Summarizer Backend - Development Plan

## Overview
A minimal FastAPI backend for text summarization using local Ollama, designed to be callable from an Android app and extensible for cloud hosting.

## Architecture Goals
- **Local-first**: Use Ollama running locally for privacy and cost control
- **Cloud-ready**: Structure code to easily deploy to cloud later
- **Minimal v1**: Focus on core summarization functionality
- **Android-friendly**: RESTful API optimized for mobile app consumption

## Technology Stack
- **Backend**: FastAPI + Python
- **LLM**: Ollama (local)
- **Server**: Uvicorn
- **Validation**: Pydantic
- **Testing**: Pytest + pytest-asyncio + httpx (for async testing)
- **Containerization**: Docker (for cloud deployment)

## Project Structure
```
app/
β”œβ”€β”€ main.py                 # FastAPI app entry point
β”œβ”€β”€ api/
β”‚   └── v1/
β”‚       β”œβ”€β”€ routes.py       # API route definitions
β”‚       └── schemas.py      # Pydantic models
β”œβ”€β”€ services/
β”‚   └── summarizer.py       # Ollama integration
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ config.py          # Configuration management
β”‚   └── logging.py         # Logging setup
tests/
β”œβ”€β”€ test_api.py            # API endpoint tests
β”œβ”€β”€ test_services.py       # Service layer tests
β”œβ”€β”€ test_schemas.py        # Pydantic model tests
β”œβ”€β”€ test_config.py         # Configuration tests
└── conftest.py           # Test configuration and fixtures
requirements.txt
Dockerfile
docker-compose.yml
README.md
```

## API Contract (v1)

### POST /api/v1/summarize
**Request:**
```json
{
  "text": "string (required)",
  "max_tokens": 256,
  "prompt": "Summarize concisely."
}
```

**Response:**
```json
{
  "summary": "string",
  "model": "llama3.1:8b",
  "tokens_used": 512,
  "latency_ms": 1234
}
```

### GET /health
**Response:**
```json
{
  "status": "ok",
  "ollama": "reachable"
}
```

## Development Phases

### Phase 1: Foundation
- [ ] Project scaffold and directory structure
- [ ] Core dependencies and requirements.txt (including test dependencies)
- [ ] Basic FastAPI app setup
- [ ] Configuration management with environment variables
- [ ] Logging setup
- [ ] Health check endpoint
- [ ] Basic test setup and configuration

### Phase 2: Core Feature
- [ ] Pydantic schemas for request/response
- [ ] Unit tests for schemas (validation, serialization)
- [ ] Ollama service integration
- [ ] Unit tests for Ollama service (mocked)
- [ ] Summarization endpoint implementation
- [ ] Integration tests for API endpoints
- [ ] Input validation and error handling
- [ ] Basic request/response logging

### Phase 3: Quality & DX
- [ ] Error handling middleware
- [ ] Request ID middleware
- [ ] Input size limits and validation
- [ ] Rate limiting (optional for v1)
- [ ] Test coverage analysis and improvement
- [ ] Performance tests for summarization endpoint

### Phase 4: Cloud-Ready Structure
- [ ] Dockerfile for containerization
- [ ] docker-compose.yml for local development
- [ ] Environment-based configuration
- [ ] CORS configuration for Android app
- [ ] Security headers and API key support (optional)
- [ ] Metrics endpoint (optional)

### Phase 5: Documentation & Examples
- [ ] Comprehensive README with setup instructions
- [ ] API documentation (FastAPI auto-docs)
- [ ] Example curl commands
- [ ] Android client integration examples
- [ ] Deployment guide for cloud hosting

## Configuration

### Environment Variables
```bash
# Ollama Configuration
OLLAMA_MODEL=llama3.1:8b
OLLAMA_HOST=http://127.0.0.1:11434
OLLAMA_TIMEOUT=30

# Server Configuration
SERVER_HOST=127.0.0.1
SERVER_PORT=8000
LOG_LEVEL=INFO

# Optional: API Security
API_KEY_ENABLED=false
API_KEY=your-secret-key

# Optional: Rate Limiting
RATE_LIMIT_ENABLED=false
RATE_LIMIT_REQUESTS=60
RATE_LIMIT_WINDOW=60
```

## Local Development Setup

### Prerequisites
1. Install Ollama:
   ```bash
   # macOS
   brew install ollama
   
   # Or download from https://ollama.ai
   ```

2. Start Ollama service:
   ```bash
   ollama serve
   ```

3. Pull a model:
   ```bash
   ollama pull llama3.1:8b
   # or
   ollama pull mistral
   ```

### Running the API
```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export OLLAMA_MODEL=llama3.1:8b

# Run the server
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload
```

### Testing the API
```bash
# Health check
curl http://127.0.0.1:8000/health

# Summarize text
curl -X POST http://127.0.0.1:8000/api/v1/summarize \
  -H "Content-Type: application/json" \
  -d '{"text": "Your long text to summarize here..."}'
```

### Running Tests
```bash
# Run all tests
pytest

# Run tests with coverage
pytest --cov=app --cov-report=html --cov-report=term

# Run specific test file
pytest tests/test_api.py

# Run tests with verbose output
pytest -v

# Run tests and stop on first failure
pytest -x
```

## Testing Strategy

### Test Types
1. **Unit Tests**
   - Pydantic model validation
   - Service layer logic (with mocked Ollama)
   - Configuration loading
   - Utility functions

2. **Integration Tests**
   - API endpoint testing with TestClient
   - End-to-end summarization flow
   - Error handling scenarios
   - Health check functionality

3. **Mock Strategy**
   - Mock Ollama HTTP calls using `httpx` or `responses`
   - Mock external dependencies
   - Use fixtures for common test data

### Test Coverage Goals
- **Minimum 90% code coverage**
- **100% coverage for critical paths** (API endpoints, error handling)
- **All edge cases tested** (empty input, large input, network failures)

### Test Data
```python
# Example test fixtures
SAMPLE_TEXT = "This is a long text that needs to be summarized..."
SAMPLE_SUMMARY = "This text discusses summarization."
MOCK_OLLAMA_RESPONSE = {
    "model": "llama3.1:8b",
    "response": SAMPLE_SUMMARY,
    "done": True
}
```

### Continuous Testing
- Tests run on every code change
- Pre-commit hooks for test execution
- CI/CD pipeline integration ready

## Android Integration

### Example Android HTTP Client
```kotlin
// Using Retrofit or OkHttp
data class SummarizeRequest(
    val text: String,
    val max_tokens: Int = 256,
    val prompt: String = "Summarize concisely."
)

data class SummarizeResponse(
    val summary: String,
    val model: String,
    val tokens_used: Int,
    val latency_ms: Int
)

// API call
@POST("api/v1/summarize")
suspend fun summarize(@Body request: SummarizeRequest): SummarizeResponse
```

## Cloud Deployment Considerations

### Future Extensions
- **Authentication**: API key or OAuth2
- **Rate Limiting**: Redis-based distributed rate limiting
- **Monitoring**: Prometheus metrics, health checks
- **Scaling**: Multiple replicas, load balancing
- **Database**: Usage tracking, user management
- **Caching**: Redis for response caching
- **Security**: HTTPS, input sanitization, CORS policies

### Deployment Options
- **Docker**: Containerized deployment
- **Cloud Platforms**: AWS, GCP, Azure, Railway, Render
- **Serverless**: AWS Lambda, Vercel Functions (with Ollama API)
- **VPS**: DigitalOcean, Linode with Docker

## Success Criteria
- [ ] API responds to health checks
- [ ] Successfully summarizes text via Ollama
- [ ] Handles errors gracefully
- [ ] Works with Android app
- [ ] Can be containerized
- [ ] **All tests pass with >90% coverage**
- [ ] Documentation is complete

## Future Enhancements (Post-v1)
- [ ] Streaming responses
- [ ] Batch summarization
- [ ] Multiple model support
- [ ] Prompt templates and presets
- [ ] Usage analytics
- [ ] Multi-language support
- [ ] Advanced rate limiting
- [ ] User authentication and authorization