Spaces:
Sleeping
Sleeping
File size: 8,216 Bytes
e7d76dd e17f3ba 7fb3113 e17f3ba | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | ---
title: ModelMatrix
emoji: π
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
---
# SAP RPT-1 Benchmarking
## π Setup
### Option 1: Docker (Recommended for Reproducibility)
```bash
# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"
# Copy .env.example to .env and paste your HuggingFace token
cp .env.example .env
# Build containers
docker-compose build
# Run SAP RPT-1 experiment
docker-compose run sap-rpt1 -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf
# Run baselines batch
docker-compose run baselines -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml
```
### Option 2: Local Install (Python >= 3.11 required)
```bash
# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"
# Install everything in one command
pip install -e ".[models,baselines]"
# Download datasets (19 datasets from OpenML)
cd code
python -m datasets.download_tabarena
cd ..
```
## π Hugging Face Token Setup (Required for SAP RPT-1 OSS)
The SAP RPT-1 OSS model weights are **gated** on Hugging Face:
1. Create account at [huggingface.co/join](https://huggingface.co/join)
2. Accept the license at [huggingface.co/SAP/sap-rpt-1-oss](https://huggingface.co/SAP/sap-rpt-1-oss)
3. Generate a token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
4. Set the token:
**Windows (PowerShell):**
```powershell
$env:HUGGING_FACE_HUB_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
```
**Linux/Mac:**
```bash
export HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
**Or using .env file** (recommended):
```bash
cp .env.example .env
# Edit .env and paste your token
```
## π§ͺ Quick Test
```bash
cd code
python ../scripts/test_sap_rpt1.py
```
This verifies HF token authentication, model download, and prediction accuracy.
## π Run Experiments
### Single Experiment
```bash
cd code
# SAP RPT-1 OSS
python -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf
# XGBoost baseline
python -m runners.run_experiment --dataset analcatdata_authorship --model xgboost
```
### Baseline Models Only (XGBoost, CatBoost, LightGBM)
```bash
cd code
# Run on ALL datasets
python -m runners.run_baselines
# Run on specific datasets
python -m runners.run_baselines --dataset analcatdata_authorship diabetes
```
### Full Batch (All Models Γ All Datasets)
```bash
cd code
python -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml
```
### Available Models
| Model Name | Type | Description |
|---|---|---|
| `sap-rpt1-hf` | Pretrained (OSS) | SAP RPT-1 OSS via HuggingFace |
| `xgboost` | Baseline | XGBoost |
| `catboost` | Baseline | CatBoost |
| `lightgbm` | Baseline | LightGBM |
## π View Results
Results are saved to `results/raw/[dataset]_[model].json`
Example output:
```json
{
"dataset": "analcatdata_authorship",
"model": "sap-rpt1-hf",
"task_type": "classification",
"n_samples": 841,
"n_features": 70,
"mean_metrics": {
"accuracy": 1.0,
"roc_auc": 1.0,
"f1_macro": 1.0
}
}
```
## π Aggregate Results
```bash
cd code
python -m analysis.aggregate_results
```
## π Web Interface (Advanced Version)
We've completely overhauled the interactive web application to provide a production-grade, scientific benchmarking experience directly in your browser.
**Tech Stack & Architecture:**
- **Frontend**: Pure HTML/CSS/Vanilla JS. Built with a custom "Midnight Precision" design system featuring glassmorphism, dynamic data-aware input generation, and theme-aware custom scrollbars.
- **Backend**: Python with FastAPI and Scikit-Learn/Scipy.
- **Visualizations**: Chart.js for rendering dynamic metric comparisons.
**Key Features Built:**
- **Midnight Precision Aesthetics**: A premium, ultra-modern UI featuring animated liquid gradients, responsive design, and seamless user interaction flows.
- **Advanced Ensemble Engine**: Automatically builds and benchmarks Meta-Models on the fly:
- *Voting Ensembles*: Soft-voting probabilities across top models.
- *Stacking Ensembles*: Sklearn-native meta-learning (LogisticRegression/Ridge) layered on top of base models.
- **Statistical Rigor & Ranking**: Moves beyond simple average scores to actual scientific analysis:
- *Cross-Fold Ranking*: Olympic-style "min" ranking across all CV folds.
- *Friedman Significance Testing*: Computes P-Values to formally test if the champion model's lead is statistically significant.
- *Stability Badges*: Automatically tags models as 'Dominant', 'Competitive', or 'Volatile' based on their consistency in winning folds.
- **Interactive Live Playground**: Once the benchmark finishes, a live interface is generated.
- *Stateful Pipeline*: The backend caches the exact `LabelEncoder` states from the training phase, ensuring the live playground data is mathematically aligned with the original dataset.
- *Data-Aware UI*: Input fields dynamically adapt to numeric or categorical columns based on backend typing.
**How to start the Web App:**
```bash
cd webapp
pip install -r requirements.txt
python -m uvicorn main:app --port 8000
```
Then open your browser and navigate to `http://localhost:8000`.
## ποΈ Project Structure
```text
MINI proj SAP/
βββ code/
β βββ docker/ # Docker environments
β βββ models/ # Model wrappers (sklearn-compatible)
β β βββ sap_rpt1_hf_wrapper.py # SAP RPT-1 OSS via HuggingFace
β β βββ base_wrapper.py # Abstract base class
β β βββ ...
β βββ evaluation/ # Metrics, cross-validation, compute tracking
β βββ runners/ # Experiment execution
β β βββ run_experiment.py # Single experiment
β β βββ run_batch.py # Batch experiments
β β βββ run_baselines.py # Baseline models only
β βββ analysis/ # Results aggregation
β βββ config/ # YAML configurations
βββ webapp/ # Interactive Web Application
β βββ main.py # FastAPI Backend Server
β βββ benchmark.py # Advanced Benchmarking Engine
β βββ ensemble.py # Meta-Model Generators
β βββ requirements.txt # Web-specific dependencies
β βββ static/ # Frontend Assets
β βββ landing.html # Animated Landing Page
β βββ uploader.html # Drag & Drop Interface
β βββ arena.html # Results & Statistical Rigor UI
β βββ app.js # Client-side Logic
β βββ style.css # Midnight Precision Styles
βββ results/ # Experiment outputs
βββ scripts/
β βββ test_sap_rpt1.py # Quick-start validation test
βββ requirements.txt # Pinned dependencies
βββ setup.py # Package configuration
βββ docker-compose.yml # Docker orchestration
βββ .env.example # HF token template
```
## π Reproducibility
This repo follows NeurIPS/ICML reproducibility standards:
- **Pinned dependencies**: All packages have exact versions in `requirements.txt`
- **Fixed random seeds**: `random_state=42` across all experiments
- **Docker containers**: Isolated environments for incompatible dependencies
- **Gated model weights**: SAP RPT-1 OSS uses a fixed checkpoint (`v1.1.2`)
- **5-fold cross-validation**: Stratified splits ensure identical data partitions
## π Troubleshooting
**Python version error:**
SAP RPT-1 OSS requires Python >= 3.11. Check with `python --version`.
**Missing TabPFN Error (ModuleNotFoundError):**
If you encounter an error stating that `tabpfn` is missing when running the benchmark, install it manually:
```bash
pip install tabpfn
```
**HF Token not working:**
```bash
huggingface-cli whoami
huggingface-cli login
```
**Docker build fails:**
```bash
docker-compose build --no-cache
```
**Out of memory:**
Edit `code/config/experiments.yaml` and reduce:
```yaml
sap_rpt1_hf:
max_context_size: 2048 # Lower from 4096
bagging: 1 # Lower from 4
``` |