Spaces:

Akshay4506
/

ModelMatrix

Sleeping

App Files Files Community

ModelMatrix / README.md

Akshay4506

formatted

7fb3113 2 days ago

preview code

raw

history blame contribute delete

8.22 kB

metadata

title: ModelMatrix
emoji: 📚
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit

SAP RPT-1 Benchmarking

🚀 Setup

Option 1: Docker (Recommended for Reproducibility)

# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"

# Copy .env.example to .env and paste your HuggingFace token
cp .env.example .env

# Build containers
docker-compose build

# Run SAP RPT-1 experiment
docker-compose run sap-rpt1 -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf

# Run baselines batch
docker-compose run baselines -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml

Option 2: Local Install (Python >= 3.11 required)

# Clone the repo
git clone <repo-url>
cd "MINI proj SAP"

# Install everything in one command
pip install -e ".[models,baselines]"

# Download datasets (19 datasets from OpenML)
cd code
python -m datasets.download_tabarena
cd ..

🔑 Hugging Face Token Setup (Required for SAP RPT-1 OSS)

The SAP RPT-1 OSS model weights are gated on Hugging Face:

Create account at huggingface.co/join
Accept the license at huggingface.co/SAP/sap-rpt-1-oss
Generate a token at huggingface.co/settings/tokens
Set the token:

Windows (PowerShell):

$env:HUGGING_FACE_HUB_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Linux/Mac:

export HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Or using .env file (recommended):

cp .env.example .env
# Edit .env and paste your token

🧪 Quick Test

cd code
python ../scripts/test_sap_rpt1.py

This verifies HF token authentication, model download, and prediction accuracy.

📊 Run Experiments

Single Experiment

cd code

# SAP RPT-1 OSS
python -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf

# XGBoost baseline
python -m runners.run_experiment --dataset analcatdata_authorship --model xgboost

Baseline Models Only (XGBoost, CatBoost, LightGBM)

cd code

# Run on ALL datasets
python -m runners.run_baselines

# Run on specific datasets
python -m runners.run_baselines --dataset analcatdata_authorship diabetes

Full Batch (All Models × All Datasets)

cd code
python -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml

Available Models

Model Name	Type	Description
`sap-rpt1-hf`	Pretrained (OSS)	SAP RPT-1 OSS via HuggingFace
`xgboost`	Baseline	XGBoost
`catboost`	Baseline	CatBoost
`lightgbm`	Baseline	LightGBM

📈 View Results

Results are saved to results/raw/[dataset]_[model].json

Example output:

{
  "dataset": "analcatdata_authorship",
  "model": "sap-rpt1-hf",
  "task_type": "classification",
  "n_samples": 841,
  "n_features": 70,
  "mean_metrics": {
    "accuracy": 1.0,
    "roc_auc": 1.0,
    "f1_macro": 1.0
  }
}

📊 Aggregate Results

cd code
python -m analysis.aggregate_results

🌐 Web Interface (Advanced Version)

We've completely overhauled the interactive web application to provide a production-grade, scientific benchmarking experience directly in your browser.

Tech Stack & Architecture:

Frontend: Pure HTML/CSS/Vanilla JS. Built with a custom "Midnight Precision" design system featuring glassmorphism, dynamic data-aware input generation, and theme-aware custom scrollbars.
Backend: Python with FastAPI and Scikit-Learn/Scipy.
Visualizations: Chart.js for rendering dynamic metric comparisons.

Key Features Built:

Midnight Precision Aesthetics: A premium, ultra-modern UI featuring animated liquid gradients, responsive design, and seamless user interaction flows.
Advanced Ensemble Engine: Automatically builds and benchmarks Meta-Models on the fly:
- Voting Ensembles: Soft-voting probabilities across top models.
- Stacking Ensembles: Sklearn-native meta-learning (LogisticRegression/Ridge) layered on top of base models.
Statistical Rigor & Ranking: Moves beyond simple average scores to actual scientific analysis:
- Cross-Fold Ranking: Olympic-style "min" ranking across all CV folds.
- Friedman Significance Testing: Computes P-Values to formally test if the champion model's lead is statistically significant.
- Stability Badges: Automatically tags models as 'Dominant', 'Competitive', or 'Volatile' based on their consistency in winning folds.
Interactive Live Playground: Once the benchmark finishes, a live interface is generated.
- Stateful Pipeline: The backend caches the exact LabelEncoder states from the training phase, ensuring the live playground data is mathematically aligned with the original dataset.
- Data-Aware UI: Input fields dynamically adapt to numeric or categorical columns based on backend typing.

How to start the Web App:

cd webapp
pip install -r requirements.txt
python -m uvicorn main:app --port 8000

Then open your browser and navigate to http://localhost:8000.

🏗️ Project Structure

MINI proj SAP/
├── code/
│   ├── docker/              # Docker environments
│   ├── models/              # Model wrappers (sklearn-compatible)
│   │   ├── sap_rpt1_hf_wrapper.py  # SAP RPT-1 OSS via HuggingFace
│   │   ├── base_wrapper.py         # Abstract base class
│   │   └── ...
│   ├── evaluation/          # Metrics, cross-validation, compute tracking
│   ├── runners/             # Experiment execution
│   │   ├── run_experiment.py    # Single experiment
│   │   ├── run_batch.py         # Batch experiments
│   │   └── run_baselines.py     # Baseline models only
│   ├── analysis/            # Results aggregation
│   └── config/              # YAML configurations
├── webapp/                  # Interactive Web Application
│   ├── main.py              # FastAPI Backend Server
│   ├── benchmark.py         # Advanced Benchmarking Engine
│   ├── ensemble.py          # Meta-Model Generators
│   ├── requirements.txt     # Web-specific dependencies
│   └── static/              # Frontend Assets
│       ├── landing.html     # Animated Landing Page
│       ├── uploader.html    # Drag & Drop Interface
│       ├── arena.html       # Results & Statistical Rigor UI
│       ├── app.js           # Client-side Logic
│       └── style.css        # Midnight Precision Styles
├── results/                 # Experiment outputs
├── scripts/
│   └── test_sap_rpt1.py     # Quick-start validation test
├── requirements.txt         # Pinned dependencies
├── setup.py                 # Package configuration
├── docker-compose.yml       # Docker orchestration
└── .env.example             # HF token template

🔄 Reproducibility

This repo follows NeurIPS/ICML reproducibility standards:

Pinned dependencies: All packages have exact versions in requirements.txt
Fixed random seeds: random_state=42 across all experiments
Docker containers: Isolated environments for incompatible dependencies
Gated model weights: SAP RPT-1 OSS uses a fixed checkpoint (v1.1.2)
5-fold cross-validation: Stratified splits ensure identical data partitions

🆘 Troubleshooting

Python version error: SAP RPT-1 OSS requires Python >= 3.11. Check with python --version.

Missing TabPFN Error (ModuleNotFoundError): If you encounter an error stating that tabpfn is missing when running the benchmark, install it manually:

pip install tabpfn

HF Token not working:

huggingface-cli whoami
huggingface-cli login

Docker build fails:

docker-compose build --no-cache

Out of memory: Edit code/config/experiments.yaml and reduce:

sap_rpt1_hf:
  max_context_size: 2048  # Lower from 4096
  bagging: 1              # Lower from 4