YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
HyperOpt-GBT
HyperOptimized Gradient Boosted Trees โ a scikit-learn compatible library that combines the best innovations from XGBoost, LightGBM, CatBoost, and YDF into one implementation.
Key Innovations
| Innovation | Source | Effect |
|---|---|---|
| GOSS (Gradient-based One-Side Sampling) | LightGBM | 2-5ร faster training, often better accuracy |
| Weighted Quantile Sketch | XGBoost | +15-19% AUC on skewed distributions |
| Ordered Boosting | CatBoost | Eliminates prediction shift โ unbiased residuals |
| Ordered Target Statistics | CatBoost | Handles categoricals without target leakage |
| Histogram-based Splits | LightGBM | O(k) split finding vs O(n log n) |
| Compiled Inference Engines | YDF | 5-100ร faster prediction |
| Oblivious Trees | CatBoost | Regularization + SIMD-friendly structure |
| Cache-aware Column Blocks | XGBoost | Cache-friendly memory access |
Quick Start
from hyperopt_gbt import HyperOptGradientBoostedClassifier
clf = HyperOptGradientBoostedClassifier(
n_estimators=100,
learning_rate=0.1,
max_depth=6,
use_goss=True, # LightGBM: gradient-based sampling
binning='quantile_sketch', # XGBoost: adaptive bin boundaries
n_bins=255,
)
clf.fit(X_train, y_train)
proba = clf.predict_proba(X_test)
Installation
# From source
pip install -e .
# With benchmark dependencies
pip install -e ".[benchmark]"
# Build Rust backend (optional, for maximum speed)
cd rust_gbt && pip install maturin && maturin develop --release
Benchmark Results
Binary Classification (80K train, 20K test, 50 trees)
| Library | AUC | Train Time |
|---|---|---|
| HyperOpt-GBT (GOSS) | 0.9691 | 2.5s |
| XGBoost (hist) | 0.9661 | 1.3s |
| LightGBM | 0.9659 | 1.0s |
| CatBoost | 0.9756 | 1.5s |
GOSS: Faster AND More Accurate
| Data Used | AUC | Speedup |
|---|---|---|
| 100% (no GOSS) | 0.9659 | 1.0ร |
| 40% (GOSS) | 0.9717 | 2.4ร |
| 15% (GOSS) | 0.9740 | 5.3ร |
Quantile Sketch vs Uniform (Skewed Data)
| Bins | Uniform AUC | Quantile AUC | Gain |
|---|---|---|---|
| 63 | 0.6426 | 0.8306 | +18.8% |
| 255 | 0.6775 | 0.8295 | +15.2% |
API Reference
Classifier
HyperOptGradientBoostedClassifier(
# Core
n_estimators=100, # Number of boosting rounds
learning_rate=0.1, # Shrinkage
max_depth=6, # Maximum tree depth
# Accuracy innovations
ordered_boosting=False, # CatBoost: unbiased boosting
ordered_ts=True, # CatBoost: ordered target statistics
oblivious_trees=False, # CatBoost: balanced trees
# Speed innovations
use_goss=True, # LightGBM: gradient sampling
goss_a=0.2, # Keep top 20% by gradient magnitude
goss_b=0.1, # Sample 10% from rest
n_bins=255, # Histogram bins
binning='uniform', # 'uniform' or 'quantile_sketch'
# Regularization
l2_reg=1.0, # L2 on leaf weights
min_child_weight=1.0, # Min hessian sum in leaf
subsample=1.0, # Row subsampling
colsample_bytree=1.0, # Column subsampling
)
Regressor
HyperOptGradientBoostedRegressor(
# Same parameters as classifier
)
Inference Engines
from hyperopt_gbt import compile_inference_engine
engine = compile_inference_engine(model, engine_type='auto')
# Options: 'naive', 'flat', 'simd', 'quickscorer', 'auto'
predictions = engine.predict(X_binned)
Rust Backend
The optional Rust backend provides the fastest training via:
- Rayon parallelism for histogram building across features
- Flat tree arrays (
Vec<TreeNode>) โ no pointer chasing - Zero-copy NumPy interop via PyO3
- LTO + native CPU in release mode
import rust_gbt
model = rust_gbt.PyRustGBT()
model.fit(X_train, y_train,
n_estimators=50, learning_rate=0.1, max_depth=6,
use_goss=True, goss_a=0.2, goss_b=0.1,
binning="quantile", task="classification")
proba = model.predict_proba(X_test)
Run Benchmarks
python benchmark_quick.py
Architecture
See ARCHITECTURE.md for the full technical design.
See RESULTS.md for detailed benchmark results.
See WHY_HYPEROPT_GBT.md for the motivation.
License
Apache 2.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support