--- title: README emoji: 🐨 colorFrom: yellow colorTo: green sdk: static pinned: false --- # Datasets for NeurIPS Submission This page serves as the central index for all datasets associated with our NeurIPS 2026 submission. ## Overview We release multiple datasets covering 8 Biomedical domains. All datasets are hosted on Hugging Face and are publicly accessible. --- ## 📦 Datasets ### Dataset 1 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/BNG_breast-w - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 2 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/CDC_Diabetes_Health_Indicators - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 3 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Cardiovascular-Disease-dataset - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 4 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Diabetes_UCI - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 5 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Diabetic_Retinopathy_Debrecen - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 6 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Heart-Disease-Dataset-_Comprehensive - **Task:** Regression - **Domain:** Clinical & Healthcare ### Dataset 7 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/HeartDisease_UCI - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 8 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MIMIC_II - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 9 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MUSIC - **Task:** MultiClass Classification - **Domain:** Clinical & Healthcare ### Dataset 10 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/National_Health_and_Nutrition_Health_Survey - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 11 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Parkinson_Speech - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 12 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/PatientCare - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 13 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/VitalDB - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 14 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/analcatdata-dmft - **Task:** MultiClass Classification - **Domain:** Clinical & Healthcare ### Dataset 15 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/audiology - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 16 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/blood-transfusion-service - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 17 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/ilpd_patient_data - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 18 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/lymphography - **Task:** MultiClass Classification - **Domain:** Clinical & Healthcare ### Dataset 19 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/maternal_health_risk - **Task:** MultiClass Classification - **Domain:** Clinical & Healthcare ### Dataset 20 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/pima_diabetes - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 21 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sick - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 22 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/thyroid - **Task:** MultiClass Classification - **Domain:** Clinical & Healthcare ### Dataset 23 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/thyroid-ann - **Task:** MultiClass Classification - **Domain:** Clinical & Healthcare ### Dataset 24 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/thyroid-dis - **Task:** MultiClass Classification - **Domain:** Clinical & Healthcare ### Dataset 25 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/wisconsin-breast-cancer - **Task:** Binary Classification - **Domain:** Clinical & Healthcare ### Dataset 26 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/ESOL - **Task:** Regression - **Domain:** Drug Discovery ### Dataset 27 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/HIV - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 28 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/QM8_E1-CC2 - **Task:** Regression - **Domain:** Drug Discovery ### Dataset 29 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/QM8_f1-CC2 - **Task:** Regression - **Domain:** Drug Discovery ### Dataset 30 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/QM9_g298 - **Task:** Regression - **Domain:** Drug Discovery ### Dataset 31 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/QM9_gap - **Task:** Regression - **Domain:** Drug Discovery ### Dataset 32 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/QSAR_biodegradation - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 33 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/SIDER_gastro - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 34 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/SIDER_nervous - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 35 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Tox21_NRAhR - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 36 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Tox21_NRER - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 37 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Tox21_SRMMP - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 38 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/bioresponse - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 39 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/vep_pathogenic_coding - **Task:** Binary Classification - **Domain:** Drug Discovery ### Dataset 40 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/continental_1k - **Task:** MultiClass Classification - **Domain:** Genomics ### Dataset 41 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/continental_50k - **Task:** MultiClass Classification - **Domain:** Genomics ### Dataset 42 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/continental_50k_missing - **Task:** MultiClass Classification - **Domain:** Genomics ### Dataset 43 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/coords_1k - **Task:** Multi-Target Regression - **Domain:** Genomics ### Dataset 44 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/coords_50k - **Task:** Multi-Target Regression - **Domain:** Genomics ### Dataset 45 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/dna - **Task:** MultiClass Classification - **Domain:** Genomics ### Dataset 46 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/pca_1k - **Task:** Multi-Target Regression - **Domain:** Genomics ### Dataset 47 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/pca_50k - **Task:** Multi-Target Regression - **Domain:** Genomics ### Dataset 48 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sexcheck_chrX_50 - **Task:** Binary Classification - **Domain:** Genomics ### Dataset 49 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sexcheck_chrX_500 - **Task:** Binary Classification - **Domain:** Genomics ### Dataset 50 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sexcheck_chrX_50k - **Task:** Binary Classification - **Domain:** Genomics ### Dataset 51 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/subpop_1000_consec - **Task:** MultiClass Classification - **Domain:** Genomics ### Dataset 52 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/subpop_50k - **Task:** MultiClass Classification - **Domain:** Genomics ### Dataset 53 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Gum1_s - **Task:** Regression - **Domain:** Metabolomics ### Dataset 54 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MTBLS136 - **Task:** Binary Classification - **Domain:** Metabolomics ### Dataset 55 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MTBLS161 - **Task:** Binary Classification - **Domain:** Metabolomics ### Dataset 56 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MTBLS404 - **Task:** Binary Classification - **Domain:** Metabolomics ### Dataset 57 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Noc_s - **Task:** Regression - **Domain:** Metabolomics ### Dataset 58 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/PT-Mush21 - **Task:** MultiClass Classification - **Domain:** Metabolomics ### Dataset 59 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/ST000369 - **Task:** Binary Classification - **Domain:** Metabolomics ### Dataset 60 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/ST000496 - **Task:** Binary Classification - **Domain:** Metabolomics ### Dataset 61 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/ST001000 - **Task:** Binary Classification - **Domain:** Metabolomics ### Dataset 62 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/ST001047 - **Task:** Binary Classification - **Domain:** Metabolomics ### Dataset 63 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/avida_hil6_onehot - **Task:** Binary Classification - **Domain:** Proteomics ### Dataset 64 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/avida_htnfa_3mer - **Task:** Binary Classification - **Domain:** Proteomics ### Dataset 65 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/avida_htnfa_esm2 - **Task:** Binary Classification - **Domain:** Proteomics ### Dataset 66 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/avida_htnfa_onehot - **Task:** Binary Classification - **Domain:** Proteomics ### Dataset 67 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/avida_sarscov2_3mer - **Task:** Binary Classification - **Domain:** Proteomics ### Dataset 68 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/cptac_survival - **Task:** Binary Classification - **Domain:** Proteomics ### Dataset 69 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/proteinea_solubility - **Task:** Binary Classification - **Domain:** Proteomics ### Dataset 70 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/true_betalactamase_complete - **Task:** Regression - **Domain:** Proteomics ### Dataset 71 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/true_fluorescence - **Task:** Regression - **Domain:** Proteomics ### Dataset 72 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/true_melting_point - **Task:** Regression - **Domain:** Proteomics ### Dataset 73 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/hnoca_cells30 - **Task:** MultiClass Classification - **Domain:** Single-cell ### Dataset 74 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sc_bonemarrow_velocity_umap - **Task:** Multi-Target Regression - **Domain:** Single-cell ### Dataset 75 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sc_dentategyrus_latenttime - **Task:** Regression - **Domain:** Single-cell ### Dataset 76 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sc_dentategyrus_transition - **Task:** Regression - **Domain:** Single-cell ### Dataset 77 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/sc_dentategyrus_velocity_umap - **Task:** Multi-Target Regression - **Domain:** Single-cell ### Dataset 78 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/tahoe_batch_integration_14plates - **Task:** MultiClass Classification - **Domain:** Single-cell ### Dataset 79 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/tahoe_cell_name_1000 - **Task:** MultiClass Classification - **Domain:** Single-cell ### Dataset 80 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/tahoe_cell_name_500 - **Task:** MultiClass Classification - **Domain:** Single-cell ### Dataset 81 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/tahoe_cell_name_5000 - **Task:** MultiClass Classification - **Domain:** Single-cell ### Dataset 82 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/tahoe_g2m_score - **Task:** Regression - **Domain:** Single-cell ### Dataset 83 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MultiOmics_GS-BRCA - **Task:** MultiClass Classification - **Domain:** Systems Biology ### Dataset 84 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MultiOmics_GS-COAD - **Task:** MultiClass Classification - **Domain:** Systems Biology ### Dataset 85 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MultiOmics_GS-GBM - **Task:** MultiClass Classification - **Domain:** Systems Biology ### Dataset 86 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MultiOmics_GS-LGG - **Task:** MultiClass Classification - **Domain:** Systems Biology ### Dataset 87 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/MultiOmics_GS-OV - **Task:** MultiClass Classification - **Domain:** Systems Biology ### Dataset 88 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/ALLAML - **Task:** Binary Classification - **Domain:** Transcriptomics ### Dataset 89 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/CLL_SUB_111 - **Task:** MultiClass Classification - **Domain:** Transcriptomics ### Dataset 90 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/GLIOMA - **Task:** MultiClass Classification - **Domain:** Transcriptomics ### Dataset 91 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/GLI_85 - **Task:** Binary Classification - **Domain:** Transcriptomics ### Dataset 92 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/Prostate_GE - **Task:** Binary Classification - **Domain:** Transcriptomics ### Dataset 93 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/SMK_CAN_187 - **Task:** Gene Expression Classification - **Domain:** Transcriptomics ### Dataset 94 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/TOX_171 - **Task:** MultiClass Classification - **Domain:** Transcriptomics ### Dataset 95 - **Link:** https://huggingface.co/datasets/BenchmarkDatasets/lung - **Task:** MultiClass Classification - **Domain:** Transcriptomics