--- title: ModelMatrix emoji: ๐Ÿ“š colorFrom: green colorTo: blue sdk: docker pinned: false license: mit --- # SAP RPT-1 Benchmarking ## ๐Ÿš€ Setup ### Option 1: Docker (Recommended for Reproducibility) ```bash # Clone the repo git clone cd "MINI proj SAP" # Copy .env.example to .env and paste your HuggingFace token cp .env.example .env # Build containers docker-compose build # Run SAP RPT-1 experiment docker-compose run sap-rpt1 -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf # Run baselines batch docker-compose run baselines -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml ``` ### Option 2: Local Install (Python >= 3.11 required) ```bash # Clone the repo git clone cd "MINI proj SAP" # Install everything in one command pip install -e ".[models,baselines]" # Download datasets (19 datasets from OpenML) cd code python -m datasets.download_tabarena cd .. ``` ## ๐Ÿ”‘ Hugging Face Token Setup (Required for SAP RPT-1 OSS) The SAP RPT-1 OSS model weights are **gated** on Hugging Face: 1. Create account at [huggingface.co/join](https://huggingface.co/join) 2. Accept the license at [huggingface.co/SAP/sap-rpt-1-oss](https://huggingface.co/SAP/sap-rpt-1-oss) 3. Generate a token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) 4. Set the token: **Windows (PowerShell):** ```powershell $env:HUGGING_FACE_HUB_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx" ``` **Linux/Mac:** ```bash export HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx ``` **Or using .env file** (recommended): ```bash cp .env.example .env # Edit .env and paste your token ``` ## ๐Ÿงช Quick Test ```bash cd code python ../scripts/test_sap_rpt1.py ``` This verifies HF token authentication, model download, and prediction accuracy. ## ๐Ÿ“Š Run Experiments ### Single Experiment ```bash cd code # SAP RPT-1 OSS python -m runners.run_experiment --dataset analcatdata_authorship --model sap-rpt1-hf # XGBoost baseline python -m runners.run_experiment --dataset analcatdata_authorship --model xgboost ``` ### Baseline Models Only (XGBoost, CatBoost, LightGBM) ```bash cd code # Run on ALL datasets python -m runners.run_baselines # Run on specific datasets python -m runners.run_baselines --dataset analcatdata_authorship diabetes ``` ### Full Batch (All Models ร— All Datasets) ```bash cd code python -m runners.run_batch --datasets config/datasets.yaml --models config/models.yaml ``` ### Available Models | Model Name | Type | Description | |---|---|---| | `sap-rpt1-hf` | Pretrained (OSS) | SAP RPT-1 OSS via HuggingFace | | `xgboost` | Baseline | XGBoost | | `catboost` | Baseline | CatBoost | | `lightgbm` | Baseline | LightGBM | ## ๐Ÿ“ˆ View Results Results are saved to `results/raw/[dataset]_[model].json` Example output: ```json { "dataset": "analcatdata_authorship", "model": "sap-rpt1-hf", "task_type": "classification", "n_samples": 841, "n_features": 70, "mean_metrics": { "accuracy": 1.0, "roc_auc": 1.0, "f1_macro": 1.0 } } ``` ## ๐Ÿ“Š Aggregate Results ```bash cd code python -m analysis.aggregate_results ``` ## ๐ŸŒ Web Interface (Advanced Version) We've completely overhauled the interactive web application to provide a production-grade, scientific benchmarking experience directly in your browser. **Tech Stack & Architecture:** - **Frontend**: Pure HTML/CSS/Vanilla JS. Built with a custom "Midnight Precision" design system featuring glassmorphism, dynamic data-aware input generation, and theme-aware custom scrollbars. - **Backend**: Python with FastAPI and Scikit-Learn/Scipy. - **Visualizations**: Chart.js for rendering dynamic metric comparisons. **Key Features Built:** - **Midnight Precision Aesthetics**: A premium, ultra-modern UI featuring animated liquid gradients, responsive design, and seamless user interaction flows. - **Advanced Ensemble Engine**: Automatically builds and benchmarks Meta-Models on the fly: - *Voting Ensembles*: Soft-voting probabilities across top models. - *Stacking Ensembles*: Sklearn-native meta-learning (LogisticRegression/Ridge) layered on top of base models. - **Statistical Rigor & Ranking**: Moves beyond simple average scores to actual scientific analysis: - *Cross-Fold Ranking*: Olympic-style "min" ranking across all CV folds. - *Friedman Significance Testing*: Computes P-Values to formally test if the champion model's lead is statistically significant. - *Stability Badges*: Automatically tags models as 'Dominant', 'Competitive', or 'Volatile' based on their consistency in winning folds. - **Interactive Live Playground**: Once the benchmark finishes, a live interface is generated. - *Stateful Pipeline*: The backend caches the exact `LabelEncoder` states from the training phase, ensuring the live playground data is mathematically aligned with the original dataset. - *Data-Aware UI*: Input fields dynamically adapt to numeric or categorical columns based on backend typing. **How to start the Web App:** ```bash cd webapp pip install -r requirements.txt python -m uvicorn main:app --port 8000 ``` Then open your browser and navigate to `http://localhost:8000`. ## ๐Ÿ—๏ธ Project Structure ```text MINI proj SAP/ โ”œโ”€โ”€ code/ โ”‚ โ”œโ”€โ”€ docker/ # Docker environments โ”‚ โ”œโ”€โ”€ models/ # Model wrappers (sklearn-compatible) โ”‚ โ”‚ โ”œโ”€โ”€ sap_rpt1_hf_wrapper.py # SAP RPT-1 OSS via HuggingFace โ”‚ โ”‚ โ”œโ”€โ”€ base_wrapper.py # Abstract base class โ”‚ โ”‚ โ””โ”€โ”€ ... โ”‚ โ”œโ”€โ”€ evaluation/ # Metrics, cross-validation, compute tracking โ”‚ โ”œโ”€โ”€ runners/ # Experiment execution โ”‚ โ”‚ โ”œโ”€โ”€ run_experiment.py # Single experiment โ”‚ โ”‚ โ”œโ”€โ”€ run_batch.py # Batch experiments โ”‚ โ”‚ โ””โ”€โ”€ run_baselines.py # Baseline models only โ”‚ โ”œโ”€โ”€ analysis/ # Results aggregation โ”‚ โ””โ”€โ”€ config/ # YAML configurations โ”œโ”€โ”€ webapp/ # Interactive Web Application โ”‚ โ”œโ”€โ”€ main.py # FastAPI Backend Server โ”‚ โ”œโ”€โ”€ benchmark.py # Advanced Benchmarking Engine โ”‚ โ”œโ”€โ”€ ensemble.py # Meta-Model Generators โ”‚ โ”œโ”€โ”€ requirements.txt # Web-specific dependencies โ”‚ โ””โ”€โ”€ static/ # Frontend Assets โ”‚ โ”œโ”€โ”€ landing.html # Animated Landing Page โ”‚ โ”œโ”€โ”€ uploader.html # Drag & Drop Interface โ”‚ โ”œโ”€โ”€ arena.html # Results & Statistical Rigor UI โ”‚ โ”œโ”€โ”€ app.js # Client-side Logic โ”‚ โ””โ”€โ”€ style.css # Midnight Precision Styles โ”œโ”€โ”€ results/ # Experiment outputs โ”œโ”€โ”€ scripts/ โ”‚ โ””โ”€โ”€ test_sap_rpt1.py # Quick-start validation test โ”œโ”€โ”€ requirements.txt # Pinned dependencies โ”œโ”€โ”€ setup.py # Package configuration โ”œโ”€โ”€ docker-compose.yml # Docker orchestration โ””โ”€โ”€ .env.example # HF token template ``` ## ๐Ÿ”„ Reproducibility This repo follows NeurIPS/ICML reproducibility standards: - **Pinned dependencies**: All packages have exact versions in `requirements.txt` - **Fixed random seeds**: `random_state=42` across all experiments - **Docker containers**: Isolated environments for incompatible dependencies - **Gated model weights**: SAP RPT-1 OSS uses a fixed checkpoint (`v1.1.2`) - **5-fold cross-validation**: Stratified splits ensure identical data partitions ## ๐Ÿ†˜ Troubleshooting **Python version error:** SAP RPT-1 OSS requires Python >= 3.11. Check with `python --version`. **Missing TabPFN Error (ModuleNotFoundError):** If you encounter an error stating that `tabpfn` is missing when running the benchmark, install it manually: ```bash pip install tabpfn ``` **HF Token not working:** ```bash huggingface-cli whoami huggingface-cli login ``` **Docker build fails:** ```bash docker-compose build --no-cache ``` **Out of memory:** Edit `code/config/experiments.yaml` and reduce: ```yaml sap_rpt1_hf: max_context_size: 2048 # Lower from 4096 bagging: 1 # Lower from 4 ```