Spaces:
Running
Running
Architecture Documentation
Overview
The NLP Analysis API follows a clean architecture pattern with clear separation of concerns. This document explains the structure and design decisions.
Directory Structure
sentimant/
βββ main.py # Application entry point
βββ run_server.py # Server startup script
βββ requirements.txt # Dependencies
βββ README.md # User documentation
βββ ARCHITECTURE.md # This file
βββ lib/ # Core application code
βββ __init__.py
βββ models.py # Data models/schemas
βββ services.py # Business logic
βββ routes.py # API routes
βββ providers/ # Model management
βββ __init__.py
βββ model_providers.py # Model providers
Architecture Layers
1. Models Layer (lib/models.py)
Responsibility: Define data structures using Pydantic for:
- Request validation
- Response serialization
- Type safety
Key Models:
TextInput: Input for text-based operationsBatchTextInput: Input for batch processingSentimentResponse: Sentiment analysis outputNERResponse: Named Entity Recognition outputTranslationResponse: Translation outputEntity: Individual entity structure
2. Providers Layer (lib/providers/model_providers.py)
Responsibility: Model loading, initialization, and prediction
Design Pattern: Provider pattern
Key Components:
ModelProvider (Base Class)
- Abstract base for all model providers
- Defines interface:
load_model(),predict(),is_loaded()
SentimentModelProvider
- Manages sentiment analysis models
- Default:
cardiffnlp/twitter-roberta-base-sentiment-latest - Handles model loading errors with fallback
NERModelProvider
- Manages Named Entity Recognition models
- Default:
dslim/bert-base-NER - Returns aggregated entities
TranslationModelProvider
- Manages translation models
- Lazy loads models per language pair
- Caches loaded models in memory
3. Services Layer (lib/services.py)
Responsibility: Business logic and data transformation
Key Services:
SentimentService
- Analyzes sentiment using
SentimentModelProvider - Formats results into
SentimentResponse - Maps model labels to user-friendly format
- Handles batch processing
NERService
- Extracts entities using
NERModelProvider - Converts raw predictions to
Entityobjects - Returns structured
NERResponse
TranslationService
- Translates text using
TranslationModelProvider - Manages language pair selection
- Returns clean translation text
4. Routes Layer (lib/routes.py)
Responsibility: API endpoint definitions and HTTP handling
Features:
- FastAPI dependency injection for services
- Error handling and HTTP exceptions
- Request/response model validation
Endpoints:
GET /: Basic statusGET /health: Health check with model statusPOST /analyze: Sentiment analysisPOST /analyze-batch: Batch sentiment analysisPOST /ner: Named Entity RecognitionPOST /translate: Translation
5. Application Layer (main.py)
Responsibility: Application initialization and configuration
Key Responsibilities:
- FastAPI app creation
- CORS configuration
- Model provider initialization
- Service initialization
- Model loading on startup
- Router registration
Data Flow
Client Request
β
FastAPI Routes (lib/routes.py)
β
Service Layer (lib/services.py)
β
Model Provider (lib/providers/model_providers.py)
β
Hugging Face Transformers
β
Raw Prediction
β
Service Layer (data transformation)
β
Pydantic Model (validation)
β
JSON Response to Client
Design Principles
1. Separation of Concerns
- Each layer has a single, well-defined responsibility
- Models don't contain business logic
- Providers don't know about services
- Routes don't contain business logic
2. Dependency Injection
- Services injected into routes via FastAPI dependencies
- Enables easy testing and mocking
- Loose coupling between components
3. Clean Interfaces
- Abstract base classes define contracts
- Consistent method signatures
- Type hints throughout
4. Error Handling
- Comprehensive exception handling at each layer
- User-friendly error messages
- Proper HTTP status codes
5. Model Management
- Lazy loading for translation models
- Eager loading for core models (sentiment, NER)
- Caching to avoid redundant loads
Extension Points
Adding a New Model Type
- Create Provider (
lib/providers/model_providers.py):
class NewModelProvider(ModelProvider):
def __init__(self, model_name: str = "model/path"):
super().__init__()
self.model_name = model_name
def load_model(self):
# Load model logic
pass
def predict(self, text: str):
# Prediction logic
pass
- Create Service (
lib/services.py):
class NewModelService:
def __init__(self, model_provider: NewModelProvider):
self.model_provider = model_provider
def process(self, text: str) -> ResponseModel:
# Business logic
pass
- Add Route (
lib/routes.py):
@router.post("/new-endpoint", response_model=ResponseModel)
async def new_endpoint(
input_data: InputModel,
service: NewModelService = Depends(get_new_model_service)
):
return service.process(input_data.text)
- Register in main.py:
new_model = NewModelProvider()
new_service = NewModelService(new_model)
# Add to routes
Adding a New Endpoint
- Create route in
lib/routes.py - Use dependency injection for services
- Define request/response models in
lib/models.py - Router automatically picks it up
Testing Strategy
Unit Tests
- Test each service independently
- Mock model providers
- Test data transformations
Integration Tests
- Test full request/response cycle
- Use test fixtures
- Verify model outputs
Load Tests
- Test batch processing
- Test concurrent requests
- Measure response times
Deployment Considerations
Model Loading
- First request may be slow (cold start)
- Consider warming up models on startup
- Monitor memory usage
Caching
- Translation models cached in memory
- Consider Redis for distributed caching
- Cache predictions for frequently used texts
Scaling
- Stateless design enables horizontal scaling
- Consider model server separation
- Use load balancing
Future Enhancements
- Model Registry: Centralized model management
- Async Processing: Background task queue for long operations
- Model Versioning: Support multiple model versions
- Metrics: Prometheus metrics integration
- Auth: API key authentication
- Rate Limiting: Request rate limiting
- Batch Processing: Async batch job processing
- Model A/B Testing: Compare model performance
Performance Optimizations
- Model Quantization: Reduce model size and speed
- TensorRT/ONNX: Faster inference
- Batching: Process multiple texts together
- GPU Support: CUDA acceleration
- Connection Pooling: Efficient database connections
- Response Caching: Cache frequent requests
Security Considerations
- Input Validation: All inputs validated via Pydantic
- Rate Limiting: Prevent abuse
- CORS: Configured for Flutter app
- Logging: Comprehensive logging for audit
- Error Messages: Don't expose internal details