Architecture Documentation
System Architecture
High-Level Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Layer β
β (Web Apps, Mobile Apps, Other Services) β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
β HTTP/REST API
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β API Gateway Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β FastAPI Application β β
β β - Request Validation β β
β β - Authentication (optional) β β
β β - Rate Limiting β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Monitoring Middleware β β
β β - Prediction Logging β β
β β - Data Drift Detection β β
β β - Performance Tracking β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Inference Engine β β
β β - Async Processing β β
β β - Batch Handling β β
β β - Error Handling β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
β Model Calls
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β Model Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Transformer Models β β
β β - Russian BERT β β
β β - RoBERTa β β
β β - DistilBERT β β
β β - Ensemble Models β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β Data Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Tokenization β β
β β - HuggingFace Tokenizers β β
β β - Subword Tokenization β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Model Architecture Details
Transformer Model Flow
Input Text: "ΠΡΡΠΈΠ½ ΠΎΠ±ΡΡΠ²ΠΈΠ» ΠΎ Π½ΠΎΠ²ΡΡ
ΠΌΠ΅ΡΠ°Ρ
ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠΈ ΡΠΊΠΎΠ½ΠΎΠΌΠΈΠΊΠΈ"
β
βββΊ Text Preprocessing
β βββΊ Normalize: "ΠΏΡΡΠΈΠ½ ΠΎΠ±ΡΡΠ²ΠΈΠ» ΠΎ Π½ΠΎΠ²ΡΡ
ΠΌΠ΅ΡΠ°Ρ
ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠΈ ΡΠΊΠΎΠ½ΠΎΠΌΠΈΠΊΠΈ"
β
βββΊ Tokenization (HuggingFace)
β βββΊ Tokens: ["[CLS]", "ΠΏΡΡΠΈΠ½", "ΠΎΠ±ΡΡΠ²ΠΈΠ»", "ΠΎ", "Π½ΠΎΠ²ΡΡ
", "ΠΌΠ΅ΡΠ°Ρ
", ...]
β βββΊ Token IDs: [101, 1234, 5678, ...]
β
βββΊ Embedding Layer
β βββΊ [batch, seq_len, 768]
β
βββΊ BERT Encoder (12 layers)
β βββΊ Multi-Head Self-Attention (12 heads)
β βββΊ Feed-Forward Network
β βββΊ Layer Normalization
β βββΊ Residual Connections
β βββΊ Output: [batch, seq_len, 768]
β
βββΊ Pooling
β βββΊ [CLS] token or Attention Pooling
β βββΊ [batch, 768]
β
βββΊ Classification Head
β βββΊ Dropout(0.3)
β βββΊ Linear(768 β 768) + ReLU
β βββΊ Dropout(0.3)
β βββΊ Linear(768 β num_labels)
β βββΊ Output: [batch, num_labels]
β
βββΊ Sigmoid Activation
β βββΊ Probabilities: [batch, num_labels]
β
βββΊ Threshold Filtering (0.5)
βββΊ Final Tags: ["ΠΏΠΎΠ»ΠΈΡΠΈΠΊΠ°", "ΡΠΊΠΎΠ½ΠΎΠΌΠΈΠΊΠ°"]
Ensemble Architecture
Input: Title + Snippet
β
βββΊ Model 1 (Russian BERT)
β βββΊ Predictions: [0.9, 0.7, 0.3, ...]
β
βββΊ Model 2 (RoBERTa)
β βββΊ Predictions: [0.85, 0.75, 0.4, ...]
β
βββΊ Model 3 (DistilBERT)
β βββΊ Predictions: [0.88, 0.72, 0.35, ...]
β
βββΊ Ensemble Combination
βββΊ Weighted Average (weights: [0.4, 0.3, 0.3])
βββΊ Final Predictions: [0.88, 0.73, 0.35, ...]
Data Flow
Training Data Flow
Raw TSV Files
β
βββΊ Load Data (pandas)
β βββΊ Filter nulls
β
βββΊ Text Preprocessing
β βββΊ Normalize text
β βββΊ Lowercase
β βββΊ Remove special chars
β
βββΊ Tag Processing
β βββΊ Split tags
β βββΊ Filter by frequency
β βββΊ Create label mapping
β
βββΊ Data Splitting
β βββΊ Train (dates < 2018-10-01)
β βββΊ Validation (2018-10-01 to 2018-12-01)
β βββΊ Test (dates >= 2018-12-01)
β
βββΊ Dataset Creation
β βββΊ Tokenization
β βββΊ Padding/Truncation
β βββΊ Multi-hot encoding
β
βββΊ DataLoader
βββΊ Batches for training
Inference Data Flow
API Request
β
βββΊ Request Validation (Pydantic)
β βββΊ Validate title, snippet, threshold
β
βββΊ Text Preprocessing
β βββΊ Normalize and clean
β
βββΊ Tokenization
β βββΊ Convert to token IDs
β
βββΊ Model Inference
β βββΊ Forward pass through BERT
β
βββΊ Post-processing
β βββΊ Sigmoid activation
β βββΊ Threshold filtering
β βββΊ Top-K selection
β
βββΊ Monitoring
β βββΊ Log prediction
β βββΊ Record for drift detection
β βββΊ Track performance
β
βββΊ Response
βββΊ JSON with predictions
Component Interactions
Training Pipeline
Config (Hydra)
β
βββΊ Data Loading
β βββΊ Dataset Creation
β
βββΊ Model Initialization
β βββΊ Load Pre-trained BERT
β
βββΊ Training Loop
β βββΊ Forward Pass
β βββΊ Loss Calculation
β βββΊ Backward Pass
β βββΊ Optimizer Step
β
βββΊ Validation
β βββΊ Metrics Calculation
β
βββΊ Experiment Tracking
β βββΊ WandB Logging
β βββΊ MLflow Tracking
β βββΊ DVC Versioning
β
βββΊ Model Checkpointing
βββΊ Save Best Model
API Request Flow
HTTP Request
β
βββΊ CORS Middleware
β
βββΊ Monitoring Middleware
β βββΊ Start timer
β
βββΊ Request Validation
β βββΊ Pydantic validation
β
βββΊ Inference
β βββΊ Text preprocessing
β βββΊ Tokenization
β βββΊ Model forward pass
β βββΊ Post-processing
β
βββΊ Monitoring
β βββΊ Log prediction
β βββΊ Check drift
β βββΊ Update metrics
β
βββΊ HTTP Response
βββΊ JSON with predictions
Technology Stack
Core ML
- PyTorch: Deep learning framework
- PyTorch Lightning: Training framework
- Transformers: HuggingFace transformers library
- Russian BERT: DeepPavlov/rubert-base-cased
API & Web
- FastAPI: Modern Python web framework
- Uvicorn: ASGI server
- Pydantic: Data validation
MLOps
- WandB: Experiment tracking
- MLflow: Model registry
- DVC: Data versioning
- Optuna: Hyperparameter tuning
- Hydra: Configuration management
Infrastructure
- Docker: Containerization
- GitHub Actions: CI/CD
- Nginx: Reverse proxy (optional)
Monitoring
- Custom Monitoring: Performance, drift, logging
- Prometheus (optional): Metrics collection
- Grafana (optional): Visualization
Scalability Considerations
Horizontal Scaling
- Stateless API design
- Load balancer support
- Multiple worker processes
- Container orchestration (Kubernetes)
Performance Optimization
- Async inference
- Batch processing
- Model quantization (future)
- GPU acceleration
- Caching (future)
High Availability
- Health checks
- Graceful degradation
- Circuit breakers (future)
- Retry mechanisms