File size: 8,216 Bytes
e7d76dd
 
 
 
 
 
 
 
 
 
e17f3ba
 
 
 
 
 
 
 
7fb3113
e17f3ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
---
title: ModelMatrix
emoji: πŸ“š
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
---

# SAP RPT-1 Benchmarking
## πŸš€ Setup

### Option 1: Docker (Recommended for Reproducibility)

```bash
# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"

# Copy .env.example to .env and paste your HuggingFace token
cp .env.example .env

# Build containers
docker-compose build

# Run SAP RPT-1 experiment
docker-compose run sap-rpt1 -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf

# Run baselines batch
docker-compose run baselines -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml
```

### Option 2: Local Install (Python >= 3.11 required)

```bash
# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"

# Install everything in one command
pip install -e ".[models,baselines]"

# Download datasets (19 datasets from OpenML)
cd code
python -m datasets.download_tabarena
cd ..
```

## πŸ”‘ Hugging Face Token Setup (Required for SAP RPT-1 OSS)

The SAP RPT-1 OSS model weights are **gated** on Hugging Face:

1. Create account at [huggingface.co/join](https://huggingface.co/join)
2. Accept the license at [huggingface.co/SAP/sap-rpt-1-oss](https://huggingface.co/SAP/sap-rpt-1-oss)
3. Generate a token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
4. Set the token:

**Windows (PowerShell):**
```powershell
$env:HUGGING_FACE_HUB_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
```

**Linux/Mac:**
```bash
export HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```

**Or using .env file** (recommended):
```bash
cp .env.example .env
# Edit .env and paste your token
```

## πŸ§ͺ Quick Test

```bash
cd code
python ../scripts/test_sap_rpt1.py
```

This verifies HF token authentication, model download, and prediction accuracy.

## πŸ“Š Run Experiments

### Single Experiment
```bash
cd code

# SAP RPT-1 OSS
python -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf

# XGBoost baseline
python -m runners.run_experiment --dataset analcatdata_authorship --model xgboost
```

### Baseline Models Only (XGBoost, CatBoost, LightGBM)
```bash
cd code

# Run on ALL datasets
python -m runners.run_baselines

# Run on specific datasets
python -m runners.run_baselines --dataset analcatdata_authorship diabetes
```

### Full Batch (All Models Γ— All Datasets)
```bash
cd code
python -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml
```

### Available Models

| Model Name | Type | Description |
|---|---|---|
| `sap-rpt1-hf` | Pretrained (OSS) | SAP RPT-1 OSS via HuggingFace |
| `xgboost` | Baseline | XGBoost |
| `catboost` | Baseline | CatBoost |
| `lightgbm` | Baseline | LightGBM |

## πŸ“ˆ View Results

Results are saved to `results/raw/[dataset]_[model].json`

Example output:
```json
{
  "dataset": "analcatdata_authorship",
  "model": "sap-rpt1-hf",
  "task_type": "classification",
  "n_samples": 841,
  "n_features": 70,
  "mean_metrics": {
    "accuracy": 1.0,
    "roc_auc": 1.0,
    "f1_macro": 1.0
  }
}
```

## πŸ“Š Aggregate Results
```bash
cd code
python -m analysis.aggregate_results
```

## 🌐 Web Interface (Advanced Version)

We've completely overhauled the interactive web application to provide a production-grade, scientific benchmarking experience directly in your browser.

**Tech Stack & Architecture:**
- **Frontend**: Pure HTML/CSS/Vanilla JS. Built with a custom "Midnight Precision" design system featuring glassmorphism, dynamic data-aware input generation, and theme-aware custom scrollbars.
- **Backend**: Python with FastAPI and Scikit-Learn/Scipy.
- **Visualizations**: Chart.js for rendering dynamic metric comparisons.

**Key Features Built:**
- **Midnight Precision Aesthetics**: A premium, ultra-modern UI featuring animated liquid gradients, responsive design, and seamless user interaction flows.
- **Advanced Ensemble Engine**: Automatically builds and benchmarks Meta-Models on the fly:
  - *Voting Ensembles*: Soft-voting probabilities across top models.
  - *Stacking Ensembles*: Sklearn-native meta-learning (LogisticRegression/Ridge) layered on top of base models.
- **Statistical Rigor & Ranking**: Moves beyond simple average scores to actual scientific analysis:
  - *Cross-Fold Ranking*: Olympic-style "min" ranking across all CV folds.
  - *Friedman Significance Testing*: Computes P-Values to formally test if the champion model's lead is statistically significant.
  - *Stability Badges*: Automatically tags models as 'Dominant', 'Competitive', or 'Volatile' based on their consistency in winning folds.
- **Interactive Live Playground**: Once the benchmark finishes, a live interface is generated. 
  - *Stateful Pipeline*: The backend caches the exact `LabelEncoder` states from the training phase, ensuring the live playground data is mathematically aligned with the original dataset.
  - *Data-Aware UI*: Input fields dynamically adapt to numeric or categorical columns based on backend typing.

**How to start the Web App:**
```bash
cd webapp
pip install -r requirements.txt
python -m uvicorn main:app --port 8000
```
Then open your browser and navigate to `http://localhost:8000`.

## πŸ—οΈ Project Structure

```text
MINI proj SAP/
β”œβ”€β”€ code/
β”‚   β”œβ”€β”€ docker/              # Docker environments
β”‚   β”œβ”€β”€ models/              # Model wrappers (sklearn-compatible)
β”‚   β”‚   β”œβ”€β”€ sap_rpt1_hf_wrapper.py  # SAP RPT-1 OSS via HuggingFace
β”‚   β”‚   β”œβ”€β”€ base_wrapper.py         # Abstract base class
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ evaluation/          # Metrics, cross-validation, compute tracking
β”‚   β”œβ”€β”€ runners/             # Experiment execution
β”‚   β”‚   β”œβ”€β”€ run_experiment.py    # Single experiment
β”‚   β”‚   β”œβ”€β”€ run_batch.py         # Batch experiments
β”‚   β”‚   └── run_baselines.py     # Baseline models only
β”‚   β”œβ”€β”€ analysis/            # Results aggregation
β”‚   └── config/              # YAML configurations
β”œβ”€β”€ webapp/                  # Interactive Web Application
β”‚   β”œβ”€β”€ main.py              # FastAPI Backend Server
β”‚   β”œβ”€β”€ benchmark.py         # Advanced Benchmarking Engine
β”‚   β”œβ”€β”€ ensemble.py          # Meta-Model Generators
β”‚   β”œβ”€β”€ requirements.txt     # Web-specific dependencies
β”‚   └── static/              # Frontend Assets
β”‚       β”œβ”€β”€ landing.html     # Animated Landing Page
β”‚       β”œβ”€β”€ uploader.html    # Drag & Drop Interface
β”‚       β”œβ”€β”€ arena.html       # Results & Statistical Rigor UI
β”‚       β”œβ”€β”€ app.js           # Client-side Logic
β”‚       └── style.css        # Midnight Precision Styles
β”œβ”€β”€ results/                 # Experiment outputs
β”œβ”€β”€ scripts/
β”‚   └── test_sap_rpt1.py     # Quick-start validation test
β”œβ”€β”€ requirements.txt         # Pinned dependencies
β”œβ”€β”€ setup.py                 # Package configuration
β”œβ”€β”€ docker-compose.yml       # Docker orchestration
└── .env.example             # HF token template
```

## πŸ”„ Reproducibility

This repo follows NeurIPS/ICML reproducibility standards:

- **Pinned dependencies**: All packages have exact versions in `requirements.txt`
- **Fixed random seeds**: `random_state=42` across all experiments
- **Docker containers**: Isolated environments for incompatible dependencies
- **Gated model weights**: SAP RPT-1 OSS uses a fixed checkpoint (`v1.1.2`)
- **5-fold cross-validation**: Stratified splits ensure identical data partitions


## πŸ†˜ Troubleshooting

**Python version error:**
SAP RPT-1 OSS requires Python >= 3.11. Check with `python --version`.

**Missing TabPFN Error (ModuleNotFoundError):**
If you encounter an error stating that `tabpfn` is missing when running the benchmark, install it manually:
```bash
pip install tabpfn
```

**HF Token not working:**
```bash
huggingface-cli whoami
huggingface-cli login
```

**Docker build fails:**
```bash
docker-compose build --no-cache
```

**Out of memory:**
Edit `code/config/experiments.yaml` and reduce:
```yaml
sap_rpt1_hf:
  max_context_size: 2048  # Lower from 4096
  bagging: 1              # Lower from 4
```