HorizonSurge Ensemble
HorizonSurge Ensemble is a multimodal, "horizon-aware" machine learning ensemble designed to accurately forecast global migration volumes and trigger reliable early-warning alerts for mass-migration surges up to 6 months in advance.
It tracks data across 15 high-volume origin countries, ingesting monthly sequences of Legal Visa Issuances, Macroeconomic Exchange Rates, and NLP-Extracted News Sentiment Clusters.
Model Architecture
What makes this model unique is its Dynamic Horizon Weighting. Predicting a crisis 1 month away requires entirely different mathematical strengths than predicting a crisis 6 months away. The ensemble dynamically blends three underlying architectures:
- Tree-Ensemble (Random Forest): Exceptionally robust at broad surge envelope thresholding. Highly weighted for near-term (Lead 1-2) forecasting.
- PyTorch LSTM (with Custom
SurgeJointLoss): Fused with categorical Country Embeddings (nn.Embedding), this recurrent network is trained on a custom Huber + BCE objective. It acts as the "Precision Guard," heavily penalizing false alarms. - PyTorch Multi-Head Transformer: Superior at maintaining long-term sequential recall. Highly weighted for long-term (Lead 5-6) predictions to capture slow-moving crisis patterns that short-term architectures forget.
Performance Metrics (Out-of-Time Walk-Forward Validation)
Evaluated specifically on its ability to classify operational crisis surges (volumes $> 1.5$ standard deviations above the rolling mean):
| Predictive Horizon | Precision (False Alarm Guard) | Recall (Miss Rate) | F1-Score |
|---|---|---|---|
| Lead 1 (Next Month) | 0.96 | 0.96 | 0.96 |
| Lead 2 (2 Months Out) | 0.93 | 0.96 | 0.95 |
| Lead 3 (3 Months Out) | 0.92 | 0.94 | 0.93 |
| Lead 4 (4 Months Out) | 0.88 | 0.94 | 0.91 |
| Lead 5 (5 Months Out) | 0.83 | 0.94 | 0.88 |
| Lead 6 (6 Months Out) | 0.80 | 0.92 | 0.86 |
Notice that even 6 months into the future, the Transformer-weighted backbone allows the ensemble to capture 92% of all major crises with an 80% precision rate.
How to Use
First, clone the repository and ensure you have torch, scikit-learn, numpy, and joblib installed.
Load the files using the MigrationSurgeEnsemble inference wrapper:
from inference import MigrationSurgeEnsemble
# 1. Initialize the ensemble (points to the directory containing the .pth and .joblib files)
predictor = MigrationSurgeEnsemble(models_dir=".")
# 2. Provide the rolling 6-month historical data for a specific country
# Format per month: [visa_volume, exchange_rate, news_sentiment_count]
# Array structure: [T-6, T-5, T-4, T-3, T-2, T-1 (Current)]
historical_scenario = [
[15000, 19.5, 45], # Lag 6
[16000, 19.8, 52], # Lag 5
[18500, 19.9, 70], # Lag 4
[22000, 20.3, 85], # Lag 3
[24000, 20.5, 110], # Lag 2
[31000, 21.0, 140] # Lag 1
]
# 3. Generate 6-month forward projections
results = predictor.predict(country_name="Mexico", recent_6_months_data=historical_scenario)
print(results['Ensemble Prediction Volume'])
# Output: [36051.0, 38024.0, 41200.0, 43156.0, 44800.0, 41200.0]
Repository Structure Included
rf_lead_1.joblib->rf_lead_6.joblib: The 6 independent Time Horizon Random Forest models.lstm.pth: PyTorch weights for the Recurrent Architecture targeting extreme spikes.transformer.pth: PyTorch weights for the Multi-Head Attention Architecture.scaler_x.joblib,scaler_y.joblib: StandardScaler fits to ensure incoming user inference data identically matches the normalized training bounds.country_map.json: Required dictionary mapping country names to categorical embedding IDs.