ModelMatrix / README.md
Akshay4506's picture
formatted
7fb3113
---
title: ModelMatrix
emoji: πŸ“š
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
---
# SAP RPT-1 Benchmarking
## πŸš€ Setup
### Option 1: Docker (Recommended for Reproducibility)
```bash
# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"
# Copy .env.example to .env and paste your HuggingFace token
cp .env.example .env
# Build containers
docker-compose build
# Run SAP RPT-1 experiment
docker-compose run sap-rpt1 -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf
# Run baselines batch
docker-compose run baselines -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml
```
### Option 2: Local Install (Python >= 3.11 required)
```bash
# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"
# Install everything in one command
pip install -e ".[models,baselines]"
# Download datasets (19 datasets from OpenML)
cd code
python -m datasets.download_tabarena
cd ..
```
## πŸ”‘ Hugging Face Token Setup (Required for SAP RPT-1 OSS)
The SAP RPT-1 OSS model weights are **gated** on Hugging Face:
1. Create account at [huggingface.co/join](https://huggingface.co/join)
2. Accept the license at [huggingface.co/SAP/sap-rpt-1-oss](https://huggingface.co/SAP/sap-rpt-1-oss)
3. Generate a token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
4. Set the token:
**Windows (PowerShell):**
```powershell
$env:HUGGING_FACE_HUB_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
```
**Linux/Mac:**
```bash
export HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
**Or using .env file** (recommended):
```bash
cp .env.example .env
# Edit .env and paste your token
```
## πŸ§ͺ Quick Test
```bash
cd code
python ../scripts/test_sap_rpt1.py
```
This verifies HF token authentication, model download, and prediction accuracy.
## πŸ“Š Run Experiments
### Single Experiment
```bash
cd code
# SAP RPT-1 OSS
python -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf
# XGBoost baseline
python -m runners.run_experiment --dataset analcatdata_authorship --model xgboost
```
### Baseline Models Only (XGBoost, CatBoost, LightGBM)
```bash
cd code
# Run on ALL datasets
python -m runners.run_baselines
# Run on specific datasets
python -m runners.run_baselines --dataset analcatdata_authorship diabetes
```
### Full Batch (All Models Γ— All Datasets)
```bash
cd code
python -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml
```
### Available Models
| Model Name | Type | Description |
|---|---|---|
| `sap-rpt1-hf` | Pretrained (OSS) | SAP RPT-1 OSS via HuggingFace |
| `xgboost` | Baseline | XGBoost |
| `catboost` | Baseline | CatBoost |
| `lightgbm` | Baseline | LightGBM |
## πŸ“ˆ View Results
Results are saved to `results/raw/[dataset]_[model].json`
Example output:
```json
{
"dataset": "analcatdata_authorship",
"model": "sap-rpt1-hf",
"task_type": "classification",
"n_samples": 841,
"n_features": 70,
"mean_metrics": {
"accuracy": 1.0,
"roc_auc": 1.0,
"f1_macro": 1.0
}
}
```
## πŸ“Š Aggregate Results
```bash
cd code
python -m analysis.aggregate_results
```
## 🌐 Web Interface (Advanced Version)
We've completely overhauled the interactive web application to provide a production-grade, scientific benchmarking experience directly in your browser.
**Tech Stack & Architecture:**
- **Frontend**: Pure HTML/CSS/Vanilla JS. Built with a custom "Midnight Precision" design system featuring glassmorphism, dynamic data-aware input generation, and theme-aware custom scrollbars.
- **Backend**: Python with FastAPI and Scikit-Learn/Scipy.
- **Visualizations**: Chart.js for rendering dynamic metric comparisons.
**Key Features Built:**
- **Midnight Precision Aesthetics**: A premium, ultra-modern UI featuring animated liquid gradients, responsive design, and seamless user interaction flows.
- **Advanced Ensemble Engine**: Automatically builds and benchmarks Meta-Models on the fly:
- *Voting Ensembles*: Soft-voting probabilities across top models.
- *Stacking Ensembles*: Sklearn-native meta-learning (LogisticRegression/Ridge) layered on top of base models.
- **Statistical Rigor & Ranking**: Moves beyond simple average scores to actual scientific analysis:
- *Cross-Fold Ranking*: Olympic-style "min" ranking across all CV folds.
- *Friedman Significance Testing*: Computes P-Values to formally test if the champion model's lead is statistically significant.
- *Stability Badges*: Automatically tags models as 'Dominant', 'Competitive', or 'Volatile' based on their consistency in winning folds.
- **Interactive Live Playground**: Once the benchmark finishes, a live interface is generated.
- *Stateful Pipeline*: The backend caches the exact `LabelEncoder` states from the training phase, ensuring the live playground data is mathematically aligned with the original dataset.
- *Data-Aware UI*: Input fields dynamically adapt to numeric or categorical columns based on backend typing.
**How to start the Web App:**
```bash
cd webapp
pip install -r requirements.txt
python -m uvicorn main:app --port 8000
```
Then open your browser and navigate to `http://localhost:8000`.
## πŸ—οΈ Project Structure
```text
MINI proj SAP/
β”œβ”€β”€ code/
β”‚ β”œβ”€β”€ docker/ # Docker environments
β”‚ β”œβ”€β”€ models/ # Model wrappers (sklearn-compatible)
β”‚ β”‚ β”œβ”€β”€ sap_rpt1_hf_wrapper.py # SAP RPT-1 OSS via HuggingFace
β”‚ β”‚ β”œβ”€β”€ base_wrapper.py # Abstract base class
β”‚ β”‚ └── ...
β”‚ β”œβ”€β”€ evaluation/ # Metrics, cross-validation, compute tracking
β”‚ β”œβ”€β”€ runners/ # Experiment execution
β”‚ β”‚ β”œβ”€β”€ run_experiment.py # Single experiment
β”‚ β”‚ β”œβ”€β”€ run_batch.py # Batch experiments
β”‚ β”‚ └── run_baselines.py # Baseline models only
β”‚ β”œβ”€β”€ analysis/ # Results aggregation
β”‚ └── config/ # YAML configurations
β”œβ”€β”€ webapp/ # Interactive Web Application
β”‚ β”œβ”€β”€ main.py # FastAPI Backend Server
β”‚ β”œβ”€β”€ benchmark.py # Advanced Benchmarking Engine
β”‚ β”œβ”€β”€ ensemble.py # Meta-Model Generators
β”‚ β”œβ”€β”€ requirements.txt # Web-specific dependencies
β”‚ └── static/ # Frontend Assets
β”‚ β”œβ”€β”€ landing.html # Animated Landing Page
β”‚ β”œβ”€β”€ uploader.html # Drag & Drop Interface
β”‚ β”œβ”€β”€ arena.html # Results & Statistical Rigor UI
β”‚ β”œβ”€β”€ app.js # Client-side Logic
β”‚ └── style.css # Midnight Precision Styles
β”œβ”€β”€ results/ # Experiment outputs
β”œβ”€β”€ scripts/
β”‚ └── test_sap_rpt1.py # Quick-start validation test
β”œβ”€β”€ requirements.txt # Pinned dependencies
β”œβ”€β”€ setup.py # Package configuration
β”œβ”€β”€ docker-compose.yml # Docker orchestration
└── .env.example # HF token template
```
## πŸ”„ Reproducibility
This repo follows NeurIPS/ICML reproducibility standards:
- **Pinned dependencies**: All packages have exact versions in `requirements.txt`
- **Fixed random seeds**: `random_state=42` across all experiments
- **Docker containers**: Isolated environments for incompatible dependencies
- **Gated model weights**: SAP RPT-1 OSS uses a fixed checkpoint (`v1.1.2`)
- **5-fold cross-validation**: Stratified splits ensure identical data partitions
## πŸ†˜ Troubleshooting
**Python version error:**
SAP RPT-1 OSS requires Python >= 3.11. Check with `python --version`.
**Missing TabPFN Error (ModuleNotFoundError):**
If you encounter an error stating that `tabpfn` is missing when running the benchmark, install it manually:
```bash
pip install tabpfn
```
**HF Token not working:**
```bash
huggingface-cli whoami
huggingface-cli login
```
**Docker build fails:**
```bash
docker-compose build --no-cache
```
**Out of memory:**
Edit `code/config/experiments.yaml` and reduce:
```yaml
sap_rpt1_hf:
max_context_size: 2048 # Lower from 4096
bagging: 1 # Lower from 4
```