Spaces:
Sleeping
Sleeping
| title: ModelMatrix | |
| emoji: π | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| # SAP RPT-1 Benchmarking | |
| ## π Setup | |
| ### Option 1: Docker (Recommended for Reproducibility) | |
| ```bash | |
| # Clone the repo | |
| git clone <repo-url> | |
| cd "MINI proj SAP" | |
| # Copy .env.example to .env and paste your HuggingFace token | |
| cp .env.example .env | |
| # Build containers | |
| docker-compose build | |
| # Run SAP RPT-1 experiment | |
| docker-compose run sap-rpt1 -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf | |
| # Run baselines batch | |
| docker-compose run baselines -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml | |
| ``` | |
| ### Option 2: Local Install (Python >= 3.11 required) | |
| ```bash | |
| # Clone the repo | |
| git clone <repo-url> | |
| cd "MINI proj SAP" | |
| # Install everything in one command | |
| pip install -e ".[models,baselines]" | |
| # Download datasets (19 datasets from OpenML) | |
| cd code | |
| python -m datasets.download_tabarena | |
| cd .. | |
| ``` | |
| ## π Hugging Face Token Setup (Required for SAP RPT-1 OSS) | |
| The SAP RPT-1 OSS model weights are **gated** on Hugging Face: | |
| 1. Create account at [huggingface.co/join](https://huggingface.co/join) | |
| 2. Accept the license at [huggingface.co/SAP/sap-rpt-1-oss](https://huggingface.co/SAP/sap-rpt-1-oss) | |
| 3. Generate a token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) | |
| 4. Set the token: | |
| **Windows (PowerShell):** | |
| ```powershell | |
| $env:HUGGING_FACE_HUB_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx" | |
| ``` | |
| **Linux/Mac:** | |
| ```bash | |
| export HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |
| ``` | |
| **Or using .env file** (recommended): | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env and paste your token | |
| ``` | |
| ## π§ͺ Quick Test | |
| ```bash | |
| cd code | |
| python ../scripts/test_sap_rpt1.py | |
| ``` | |
| This verifies HF token authentication, model download, and prediction accuracy. | |
| ## π Run Experiments | |
| ### Single Experiment | |
| ```bash | |
| cd code | |
| # SAP RPT-1 OSS | |
| python -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf | |
| # XGBoost baseline | |
| python -m runners.run_experiment --dataset analcatdata_authorship --model xgboost | |
| ``` | |
| ### Baseline Models Only (XGBoost, CatBoost, LightGBM) | |
| ```bash | |
| cd code | |
| # Run on ALL datasets | |
| python -m runners.run_baselines | |
| # Run on specific datasets | |
| python -m runners.run_baselines --dataset analcatdata_authorship diabetes | |
| ``` | |
| ### Full Batch (All Models Γ All Datasets) | |
| ```bash | |
| cd code | |
| python -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml | |
| ``` | |
| ### Available Models | |
| | Model Name | Type | Description | | |
| |---|---|---| | |
| | `sap-rpt1-hf` | Pretrained (OSS) | SAP RPT-1 OSS via HuggingFace | | |
| | `xgboost` | Baseline | XGBoost | | |
| | `catboost` | Baseline | CatBoost | | |
| | `lightgbm` | Baseline | LightGBM | | |
| ## π View Results | |
| Results are saved to `results/raw/[dataset]_[model].json` | |
| Example output: | |
| ```json | |
| { | |
| "dataset": "analcatdata_authorship", | |
| "model": "sap-rpt1-hf", | |
| "task_type": "classification", | |
| "n_samples": 841, | |
| "n_features": 70, | |
| "mean_metrics": { | |
| "accuracy": 1.0, | |
| "roc_auc": 1.0, | |
| "f1_macro": 1.0 | |
| } | |
| } | |
| ``` | |
| ## π Aggregate Results | |
| ```bash | |
| cd code | |
| python -m analysis.aggregate_results | |
| ``` | |
| ## π Web Interface (Advanced Version) | |
| We've completely overhauled the interactive web application to provide a production-grade, scientific benchmarking experience directly in your browser. | |
| **Tech Stack & Architecture:** | |
| - **Frontend**: Pure HTML/CSS/Vanilla JS. Built with a custom "Midnight Precision" design system featuring glassmorphism, dynamic data-aware input generation, and theme-aware custom scrollbars. | |
| - **Backend**: Python with FastAPI and Scikit-Learn/Scipy. | |
| - **Visualizations**: Chart.js for rendering dynamic metric comparisons. | |
| **Key Features Built:** | |
| - **Midnight Precision Aesthetics**: A premium, ultra-modern UI featuring animated liquid gradients, responsive design, and seamless user interaction flows. | |
| - **Advanced Ensemble Engine**: Automatically builds and benchmarks Meta-Models on the fly: | |
| - *Voting Ensembles*: Soft-voting probabilities across top models. | |
| - *Stacking Ensembles*: Sklearn-native meta-learning (LogisticRegression/Ridge) layered on top of base models. | |
| - **Statistical Rigor & Ranking**: Moves beyond simple average scores to actual scientific analysis: | |
| - *Cross-Fold Ranking*: Olympic-style "min" ranking across all CV folds. | |
| - *Friedman Significance Testing*: Computes P-Values to formally test if the champion model's lead is statistically significant. | |
| - *Stability Badges*: Automatically tags models as 'Dominant', 'Competitive', or 'Volatile' based on their consistency in winning folds. | |
| - **Interactive Live Playground**: Once the benchmark finishes, a live interface is generated. | |
| - *Stateful Pipeline*: The backend caches the exact `LabelEncoder` states from the training phase, ensuring the live playground data is mathematically aligned with the original dataset. | |
| - *Data-Aware UI*: Input fields dynamically adapt to numeric or categorical columns based on backend typing. | |
| **How to start the Web App:** | |
| ```bash | |
| cd webapp | |
| pip install -r requirements.txt | |
| python -m uvicorn main:app --port 8000 | |
| ``` | |
| Then open your browser and navigate to `http://localhost:8000`. | |
| ## ποΈ Project Structure | |
| ```text | |
| MINI proj SAP/ | |
| βββ code/ | |
| β βββ docker/ # Docker environments | |
| β βββ models/ # Model wrappers (sklearn-compatible) | |
| β β βββ sap_rpt1_hf_wrapper.py # SAP RPT-1 OSS via HuggingFace | |
| β β βββ base_wrapper.py # Abstract base class | |
| β β βββ ... | |
| β βββ evaluation/ # Metrics, cross-validation, compute tracking | |
| β βββ runners/ # Experiment execution | |
| β β βββ run_experiment.py # Single experiment | |
| β β βββ run_batch.py # Batch experiments | |
| β β βββ run_baselines.py # Baseline models only | |
| β βββ analysis/ # Results aggregation | |
| β βββ config/ # YAML configurations | |
| βββ webapp/ # Interactive Web Application | |
| β βββ main.py # FastAPI Backend Server | |
| β βββ benchmark.py # Advanced Benchmarking Engine | |
| β βββ ensemble.py # Meta-Model Generators | |
| β βββ requirements.txt # Web-specific dependencies | |
| β βββ static/ # Frontend Assets | |
| β βββ landing.html # Animated Landing Page | |
| β βββ uploader.html # Drag & Drop Interface | |
| β βββ arena.html # Results & Statistical Rigor UI | |
| β βββ app.js # Client-side Logic | |
| β βββ style.css # Midnight Precision Styles | |
| βββ results/ # Experiment outputs | |
| βββ scripts/ | |
| β βββ test_sap_rpt1.py # Quick-start validation test | |
| βββ requirements.txt # Pinned dependencies | |
| βββ setup.py # Package configuration | |
| βββ docker-compose.yml # Docker orchestration | |
| βββ .env.example # HF token template | |
| ``` | |
| ## π Reproducibility | |
| This repo follows NeurIPS/ICML reproducibility standards: | |
| - **Pinned dependencies**: All packages have exact versions in `requirements.txt` | |
| - **Fixed random seeds**: `random_state=42` across all experiments | |
| - **Docker containers**: Isolated environments for incompatible dependencies | |
| - **Gated model weights**: SAP RPT-1 OSS uses a fixed checkpoint (`v1.1.2`) | |
| - **5-fold cross-validation**: Stratified splits ensure identical data partitions | |
| ## π Troubleshooting | |
| **Python version error:** | |
| SAP RPT-1 OSS requires Python >= 3.11. Check with `python --version`. | |
| **Missing TabPFN Error (ModuleNotFoundError):** | |
| If you encounter an error stating that `tabpfn` is missing when running the benchmark, install it manually: | |
| ```bash | |
| pip install tabpfn | |
| ``` | |
| **HF Token not working:** | |
| ```bash | |
| huggingface-cli whoami | |
| huggingface-cli login | |
| ``` | |
| **Docker build fails:** | |
| ```bash | |
| docker-compose build --no-cache | |
| ``` | |
| **Out of memory:** | |
| Edit `code/config/experiments.yaml` and reduce: | |
| ```yaml | |
| sap_rpt1_hf: | |
| max_context_size: 2048 # Lower from 4096 | |
| bagging: 1 # Lower from 4 | |
| ``` |