Instructions to use shay-b/hotel_bookings_adr_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use shay-b/hotel_bookings_adr_model with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("shay-b/hotel_bookings_adr_model", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
Hotel Bookings — ADR (Average Daily Rate) Prediction
Predicts the Average Daily Rate (adr, in €) for a hotel booking using
booking metadata, customer profile, and stay details.
Model Details
- Model type:
sklearn.ensemble.GradientBoostingRegressor - Training data: mathsian/hotel-bookings — Hotel Bookings Demand (Portugal, 2015-2017)
- Sample size used: 12,520 bookings (random sample of 15K, after cleaning)
- Train/Test split: 80/20, simple random sampling,
random_state=42
Hyperparameters
n_estimators: 200max_depth: 6learning_rate: 0.05subsample: 0.85min_samples_split: 5
Test-set Performance
| Metric | Value |
|---|---|
| RMSE | 19.79 € |
| MAE | 13.18 € |
| R² | 0.830 |
(Improvement over Linear Regression baseline: RMSE −32%, MAE −39%, R² +31%.)
Features Required (86 features after preprocessing)
Categorical (One-Hot):
hotel, meal, market_segment, distribution_channel,
reserved_room_type, assigned_room_type, deposit_type, customer_type,
arrival_date_month
High-cardinality (Target Encoding with KFold smoothing):
country
Numeric (Standard Scaled):
17 original features (lead_time, arrival_date_year, stays_in_*_nights,
adults, children, babies, is_repeated_guest, previous_*,
booking_changes, days_in_waiting_list, required_car_parking_spaces,
total_of_special_requests, has_agent)
Plus 13 engineered features:
total_nights, total_guests, is_family, is_solo, nights_per_guest,
had_previous_cancellation, had_previous_booking, room_type_changed,
lead_time_log, month_sin, month_cos, is_summer, is_high_season
Usage
import pickle
import numpy as np
with open("hotel_bookings_adr_model.pkl", "rb") as f:
model = pickle.load(f)
# X_new must be a (n_rows, 86) matrix produced by the same preprocessing
# pipeline as the training notebook (see Part 3.2 + Part 4.2 of the
# accompanying notebook for the exact transformations).
predictions = model.predict(X_new) # → array of predicted adr in €
Important Caveats
- Pre-processing is NOT included in this pickle. You must apply the same StandardScaler, One-Hot Encoding, and Target Encoding steps that were applied during training. Without them the predictions will be nonsense.
- Data leakage columns must be removed before applying preprocessing:
reservation_status,reservation_status_date,is_canceled— these are post-event labels. - Date validity: the model was trained on bookings with arrival dates 2015-2017. Predictions for arrivals outside this window are extrapolations and should be interpreted with caution.
Citation
If you use this model in academic work, please cite the original dataset and the accompanying notebook.
- Downloads last month
- -