Business Risk Prediction Model

XGBoost binary classifier predicting business closure risk using Yelp business data and FRED economic indicators from 2018.

Model Description

Task: Binary classification (open vs. closed)
Algorithm: XGBoost Classifier
Training Data: 16,000 businesses with 2018 economic data
Features: 61 total (Yelp engagement metrics, county economic indicators, business categories)

Features

Yelp Engagement (6):

  • rating_x_reviews, review_count, num_categories, years_in_business, num_checkins, has_checkin

Economic Indicators (5):

  • pcpi, poverty_rate, median_household_income, unemployment_rate, avg_weekly_wages

Geographic (2):

  • latitude, longitude

Business Categories (50):

  • One-hot encoded top categories (Restaurants, Food, Shopping, Beauty & Spas, Nightlife, etc.)

Performance

  • see evaluation_metrics.json

Usage

import joblib
import pandas as pd
from huggingface_hub import hf_hub_download

# Download artifacts
model_path = hf_hub_download(repo_id="yourusername/business-risk-xgboost", filename="xgboost_model.pkl")
scaler_path = hf_hub_download(repo_id="yourusername/business-risk-xgboost", filename="scaler.pkl")
config_path = hf_hub_download(repo_id="yourusername/business-risk-xgboost", filename="model_config.json")

# Load
model = joblib.load(model_path)
scaler = joblib.load(scaler_path)

# Prepare input (must match training features exactly)
input_data = pd.DataFrame([{
    'rating_x_reviews': 450.0,
    'review_count': 100,
    'num_categories': 3,
    'years_in_business': 5,
    'num_checkins': 250,
    'has_checkin': 1,
    'pcpi': 55000,
    'poverty_rate': 10.5,
    'median_household_income': 65000,
    'unemployment_rate': 4.2,
    'avg_weekly_wages': 1100,
    'latitude': 33.4484,
    'longitude': -112.0740,
    'cat_Restaurants': 1,
    'cat_Food': 0,
    # ... (include all 48 category features)
}])

# Predict
X_scaled = scaler.transform(input_data)
prob = model.predict_proba(X_scaled)[0, 1]  # Probability of staying open

print(f"Risk Score: {prob:.3f}")
print(f"Prediction: {'Open' if prob > 0.5 else 'Closed'}")

Training Details

  • Base Economic Year: 2018 (reflects pre-pandemic economic conditions)
  • Hyperparameters: See model_config.json
  • Class Imbalance Handling: scale_pos_weight parameter
  • Feature Engineering: Interaction terms (rating_x_reviews), one-hot encoding for categories

Limitations

  • Trained on 2018 economic data; may not reflect post-pandemic market conditions
  • Geographic bias toward areas with dense Yelp coverage
  • Does not account for external shocks (COVID-19, policy changes)
  • Category features limited to top 48 most common business types

Intended Use

Portfolio project demonstrating end-to-end ML pipeline with economic data integration. Not for production financial decisions.

Files

  • xgboost_model.pkl - Trained XGBoost model
  • scaler.pkl - StandardScaler for feature preprocessing
  • model_config.json - Feature names, hyperparameters, metrics
  • features.txt - List of all 61 features in order

Citation

GitHub: [https://github.com/PatZhang0214/business-risk-prediction]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results