nlp-analysis-api / ARCHITECTURE.md
karim323's picture
Add NLP Analysis API backend with FastAPI and transformers
e4eb82b
# Architecture Documentation
## Overview
The NLP Analysis API follows a clean architecture pattern with clear separation of concerns. This document explains the structure and design decisions.
## Directory Structure
```
sentimant/
β”œβ”€β”€ main.py # Application entry point
β”œβ”€β”€ run_server.py # Server startup script
β”œβ”€β”€ requirements.txt # Dependencies
β”œβ”€β”€ README.md # User documentation
β”œβ”€β”€ ARCHITECTURE.md # This file
└── lib/ # Core application code
β”œβ”€β”€ __init__.py
β”œβ”€β”€ models.py # Data models/schemas
β”œβ”€β”€ services.py # Business logic
β”œβ”€β”€ routes.py # API routes
└── providers/ # Model management
β”œβ”€β”€ __init__.py
└── model_providers.py # Model providers
```
## Architecture Layers
### 1. Models Layer (`lib/models.py`)
**Responsibility**: Define data structures using Pydantic for:
- Request validation
- Response serialization
- Type safety
**Key Models**:
- `TextInput`: Input for text-based operations
- `BatchTextInput`: Input for batch processing
- `SentimentResponse`: Sentiment analysis output
- `NERResponse`: Named Entity Recognition output
- `TranslationResponse`: Translation output
- `Entity`: Individual entity structure
### 2. Providers Layer (`lib/providers/model_providers.py`)
**Responsibility**: Model loading, initialization, and prediction
**Design Pattern**: Provider pattern
**Key Components**:
#### `ModelProvider` (Base Class)
- Abstract base for all model providers
- Defines interface: `load_model()`, `predict()`, `is_loaded()`
#### `SentimentModelProvider`
- Manages sentiment analysis models
- Default: `cardiffnlp/twitter-roberta-base-sentiment-latest`
- Handles model loading errors with fallback
#### `NERModelProvider`
- Manages Named Entity Recognition models
- Default: `dslim/bert-base-NER`
- Returns aggregated entities
#### `TranslationModelProvider`
- Manages translation models
- Lazy loads models per language pair
- Caches loaded models in memory
### 3. Services Layer (`lib/services.py`)
**Responsibility**: Business logic and data transformation
**Key Services**:
#### `SentimentService`
- Analyzes sentiment using `SentimentModelProvider`
- Formats results into `SentimentResponse`
- Maps model labels to user-friendly format
- Handles batch processing
#### `NERService`
- Extracts entities using `NERModelProvider`
- Converts raw predictions to `Entity` objects
- Returns structured `NERResponse`
#### `TranslationService`
- Translates text using `TranslationModelProvider`
- Manages language pair selection
- Returns clean translation text
### 4. Routes Layer (`lib/routes.py`)
**Responsibility**: API endpoint definitions and HTTP handling
**Features**:
- FastAPI dependency injection for services
- Error handling and HTTP exceptions
- Request/response model validation
**Endpoints**:
- `GET /`: Basic status
- `GET /health`: Health check with model status
- `POST /analyze`: Sentiment analysis
- `POST /analyze-batch`: Batch sentiment analysis
- `POST /ner`: Named Entity Recognition
- `POST /translate`: Translation
### 5. Application Layer (`main.py`)
**Responsibility**: Application initialization and configuration
**Key Responsibilities**:
- FastAPI app creation
- CORS configuration
- Model provider initialization
- Service initialization
- Model loading on startup
- Router registration
## Data Flow
```
Client Request
↓
FastAPI Routes (lib/routes.py)
↓
Service Layer (lib/services.py)
↓
Model Provider (lib/providers/model_providers.py)
↓
Hugging Face Transformers
↓
Raw Prediction
↓
Service Layer (data transformation)
↓
Pydantic Model (validation)
↓
JSON Response to Client
```
## Design Principles
### 1. Separation of Concerns
- Each layer has a single, well-defined responsibility
- Models don't contain business logic
- Providers don't know about services
- Routes don't contain business logic
### 2. Dependency Injection
- Services injected into routes via FastAPI dependencies
- Enables easy testing and mocking
- Loose coupling between components
### 3. Clean Interfaces
- Abstract base classes define contracts
- Consistent method signatures
- Type hints throughout
### 4. Error Handling
- Comprehensive exception handling at each layer
- User-friendly error messages
- Proper HTTP status codes
### 5. Model Management
- Lazy loading for translation models
- Eager loading for core models (sentiment, NER)
- Caching to avoid redundant loads
## Extension Points
### Adding a New Model Type
1. **Create Provider** (`lib/providers/model_providers.py`):
```python
class NewModelProvider(ModelProvider):
def __init__(self, model_name: str = "model/path"):
super().__init__()
self.model_name = model_name
def load_model(self):
# Load model logic
pass
def predict(self, text: str):
# Prediction logic
pass
```
2. **Create Service** (`lib/services.py`):
```python
class NewModelService:
def __init__(self, model_provider: NewModelProvider):
self.model_provider = model_provider
def process(self, text: str) -> ResponseModel:
# Business logic
pass
```
3. **Add Route** (`lib/routes.py`):
```python
@router.post("/new-endpoint", response_model=ResponseModel)
async def new_endpoint(
input_data: InputModel,
service: NewModelService = Depends(get_new_model_service)
):
return service.process(input_data.text)
```
4. **Register in main.py**:
```python
new_model = NewModelProvider()
new_service = NewModelService(new_model)
# Add to routes
```
### Adding a New Endpoint
1. Create route in `lib/routes.py`
2. Use dependency injection for services
3. Define request/response models in `lib/models.py`
4. Router automatically picks it up
## Testing Strategy
### Unit Tests
- Test each service independently
- Mock model providers
- Test data transformations
### Integration Tests
- Test full request/response cycle
- Use test fixtures
- Verify model outputs
### Load Tests
- Test batch processing
- Test concurrent requests
- Measure response times
## Deployment Considerations
### Model Loading
- First request may be slow (cold start)
- Consider warming up models on startup
- Monitor memory usage
### Caching
- Translation models cached in memory
- Consider Redis for distributed caching
- Cache predictions for frequently used texts
### Scaling
- Stateless design enables horizontal scaling
- Consider model server separation
- Use load balancing
## Future Enhancements
1. **Model Registry**: Centralized model management
2. **Async Processing**: Background task queue for long operations
3. **Model Versioning**: Support multiple model versions
4. **Metrics**: Prometheus metrics integration
5. **Auth**: API key authentication
6. **Rate Limiting**: Request rate limiting
7. **Batch Processing**: Async batch job processing
8. **Model A/B Testing**: Compare model performance
## Performance Optimizations
1. **Model Quantization**: Reduce model size and speed
2. **TensorRT/ONNX**: Faster inference
3. **Batching**: Process multiple texts together
4. **GPU Support**: CUDA acceleration
5. **Connection Pooling**: Efficient database connections
6. **Response Caching**: Cache frequent requests
## Security Considerations
1. **Input Validation**: All inputs validated via Pydantic
2. **Rate Limiting**: Prevent abuse
3. **CORS**: Configured for Flutter app
4. **Logging**: Comprehensive logging for audit
5. **Error Messages**: Don't expose internal details