| # NB-Transformer Validation Examples | |
| This directory contains three comprehensive validation scripts that reproduce all key results from the NB-Transformer paper. | |
| ## Scripts Overview | |
| ### 1. `validate_accuracy.py` - Parameter Accuracy Validation | |
| Compares parameter estimation accuracy and speed across three methods: | |
| - **NB-Transformer**: Fast neural network approach | |
| - **Classical NB GLM**: Maximum likelihood via statsmodels | |
| - **Method of Moments**: Fastest baseline method | |
| **Usage:** | |
| ```bash | |
| python validate_accuracy.py --n_tests 1000 --output_dir accuracy_results/ | |
| ``` | |
| **Expected Results:** | |
| - NB-Transformer: 14.8x faster than classical GLM | |
| - 47% better accuracy on log fold change (β) | |
| - 100% success rate vs 98.7% for classical methods | |
| ### 2. `validate_calibration.py` - P-value Calibration Validation | |
| Validates that p-values are properly calibrated under null hypothesis (β = 0). | |
| **Usage:** | |
| ```bash | |
| python validate_calibration.py --n_tests 10000 --output_dir calibration_results/ | |
| ``` | |
| **Expected Results:** | |
| - QQ plot should follow diagonal line | |
| - Kolmogorov-Smirnov test p > 0.05 (well-calibrated) | |
| - False positive rate ~5% at α = 0.05 | |
| ### 3. `validate_power.py` - Statistical Power Analysis | |
| Evaluates statistical power across experimental designs and effect sizes. | |
| **Usage:** | |
| ```bash | |
| python validate_power.py --n_tests 1000 --output_dir power_results/ | |
| ``` | |
| **Expected Results:** | |
| - Power increases with effect size and sample size | |
| - Competitive performance across all designs (3v3, 5v5, 7v7, 9v9) | |
| - Faceted power curves by experimental design | |
| ## Requirements | |
| All scripts require these additional dependencies for validation: | |
| ```bash | |
| pip install statsmodels pandas matplotlib scikit-learn | |
| ``` | |
| For enhanced plotting (optional): | |
| ```bash | |
| pip install plotnine theme-nxn | |
| ``` | |
| ## Output Files | |
| Each script generates: | |
| - **Plots**: Visualization of validation results | |
| - **CSV files**: Detailed numerical results | |
| - **Summary reports**: Text summaries of key findings | |
| ## Performance Expectations | |
| All validation scripts should complete within: | |
| - **Accuracy validation**: ~2-5 minutes for 1000 tests | |
| - **Calibration validation**: ~10-15 minutes for 10000 tests | |
| - **Power analysis**: ~15-20 minutes for 1000 tests per design | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **statsmodels not available**: Install with `pip install statsmodels` | |
| 2. **Memory errors**: Reduce `--n_tests` parameter | |
| 3. **Slow performance**: Ensure PyTorch is using GPU/MPS if available | |
| 4. **Plot display errors**: Plots save to files even if display fails | |
| ### Expected Performance Metrics | |
| Based on v13 model validation: | |
| | Metric | NB-Transformer | Classical GLM | Method of Moments | | |
| |--------|---------------|---------------|-------------------| | |
| | Success Rate | 100.0% | 98.7% | 100.0% | | |
| | Time (ms) | 0.076 | 1.128 | 0.021 | | |
| | μ MAE | 0.202 | 0.212 | 0.213 | | |
| | β MAE | **0.152** | 0.284 | 0.289 | | |
| | α MAE | **0.477** | 0.854 | 0.852 | | |
| ## Citation | |
| If you use these validation scripts in your research, please cite: | |
| ```bibtex | |
| @software{svensson2025nbtransformer, | |
| title={NB-Transformer: Fast Negative Binomial GLM Parameter Estimation using Transformers}, | |
| author={Svensson, Valentine}, | |
| year={2025}, | |
| url={https://huggingface.co/valsv/nb-transformer} | |
| } | |
| ``` |