Business Risk Prediction Model
XGBoost binary classifier predicting business closure risk using Yelp business data and FRED economic indicators from 2018.
Model Description
Task: Binary classification (open vs. closed)
Algorithm: XGBoost Classifier
Training Data: 16,000 businesses with 2018 economic data
Features: 61 total (Yelp engagement metrics, county economic indicators, business categories)
Features
Yelp Engagement (6):
rating_x_reviews,review_count,num_categories,years_in_business,num_checkins,has_checkin
Economic Indicators (5):
pcpi,poverty_rate,median_household_income,unemployment_rate,avg_weekly_wages
Geographic (2):
latitude,longitude
Business Categories (50):
- One-hot encoded top categories (Restaurants, Food, Shopping, Beauty & Spas, Nightlife, etc.)
Performance
- see evaluation_metrics.json
Usage
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download
# Download artifacts
model_path = hf_hub_download(repo_id="yourusername/business-risk-xgboost", filename="xgboost_model.pkl")
scaler_path = hf_hub_download(repo_id="yourusername/business-risk-xgboost", filename="scaler.pkl")
config_path = hf_hub_download(repo_id="yourusername/business-risk-xgboost", filename="model_config.json")
# Load
model = joblib.load(model_path)
scaler = joblib.load(scaler_path)
# Prepare input (must match training features exactly)
input_data = pd.DataFrame([{
'rating_x_reviews': 450.0,
'review_count': 100,
'num_categories': 3,
'years_in_business': 5,
'num_checkins': 250,
'has_checkin': 1,
'pcpi': 55000,
'poverty_rate': 10.5,
'median_household_income': 65000,
'unemployment_rate': 4.2,
'avg_weekly_wages': 1100,
'latitude': 33.4484,
'longitude': -112.0740,
'cat_Restaurants': 1,
'cat_Food': 0,
# ... (include all 48 category features)
}])
# Predict
X_scaled = scaler.transform(input_data)
prob = model.predict_proba(X_scaled)[0, 1] # Probability of staying open
print(f"Risk Score: {prob:.3f}")
print(f"Prediction: {'Open' if prob > 0.5 else 'Closed'}")
Training Details
- Base Economic Year: 2018 (reflects pre-pandemic economic conditions)
- Hyperparameters: See
model_config.json - Class Imbalance Handling:
scale_pos_weightparameter - Feature Engineering: Interaction terms (
rating_x_reviews), one-hot encoding for categories
Limitations
- Trained on 2018 economic data; may not reflect post-pandemic market conditions
- Geographic bias toward areas with dense Yelp coverage
- Does not account for external shocks (COVID-19, policy changes)
- Category features limited to top 48 most common business types
Intended Use
Portfolio project demonstrating end-to-end ML pipeline with economic data integration. Not for production financial decisions.
Files
xgboost_model.pkl- Trained XGBoost modelscaler.pkl- StandardScaler for feature preprocessingmodel_config.json- Feature names, hyperparameters, metricsfeatures.txt- List of all 61 features in order
Citation
GitHub: [https://github.com/PatZhang0214/business-risk-prediction]
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Evaluation results
- roc-auc on yelp-fred-combinedself-reported0.XXX
- f1 on yelp-fred-combinedself-reported0.XXX