phess2
/

lipschitz-transformers

English

Model card Files Files and versions

xet

Community

phess2 commited on Jul 14, 2025

Commit

6afb1d8

verified ·

1 Parent(s): 0624b71

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +77 -3

README.md CHANGED Viewed

@@ -1,3 +1,77 @@
----
-license: mit
----

+# Lipschitz Bounded Neural Networks: Figure Data and Model Checkpoints
+This repository contains the figure data and model checkpoints supporting our paper on training Lipschitz bounded neural networks. The repository provides reproducible data for key figures and pre-trained models that demonstrate the effectiveness of various constraint methods.
+## Repository Structure
+```
+├── figures/              # Figure data and reproduction scripts
+│   ├── figure_2/        # Data for Figure 2
+│   ├── figure_3/        # Data for Figure 3
+│   ├── figure_4/        # Data for Figure 4
+│   ├── reproduce_figures.py   # Script to reproduce figures from CSV data
+│   ├── requirements.txt       # Python dependencies
+│   └── README.md             # Detailed usage instructions
+└── models/              # Pre-trained model checkpoints
+    ├── MLPs/            # MLP models trained on ImageNet 1K
+    └── transformers/    # Transformer models trained on Shakespeare
+```
+## Figure Data
+The `figures/` directory contains CSV files with the processed data used to create each figure in our paper, along with scripts to reproduce the exact plots. This enables full reproducibility of our experimental results.
+## Model Checkpoints
+### MLP Models (`models/MLPs/`)
+Two MLP models trained on ImageNet 1K that were used to generate **Figure 3** in our paper.
+**Training Details:**
+- Dataset: ImageNet 1K
+- Purpose: Figure 3 generation and analysis
+- Architecture: Multi-layer perceptrons with spectral constraints
+### Transformer Models (`models/transformers/`)
+Best-performing transformer model trained on the Shakespeare word dataset, representing our optimal validation accuracy checkpoint.
+**Training Details:**
+- Dataset: Shakespeare word-level dataset
+- Selection: Best validation accuracy checkpoint
+- Architecture: Transformer with Lipschitz constraints
+## Usage
+### Reproducing Figures
+1. Install dependencies:
+```bash
+pip install -r figures/requirements.txt
+```
+2. Run the reproduction script:
+```bash
+python figures/reproduce_figures.py
+```
+### Loading Model Checkpoints
+The model checkpoints can be loaded using the main codebase. For detailed instructions on model loading and usage, please refer to the main project repository: [https://github.com/Arongil/lipschitz-transformers](https://github.com/Arongil/lipschitz-transformers)
+## Paper Abstract
+Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and overfitting. To combat these problems, past research has looked at building neural networks entirely from Lipschitz components. However, these techniques have not matured to the point where researchers have trained a modern architecture such as a transformer with a Lipschitz certificate enforced beyond initialization. To explore this gap, we begin by developing and benchmarking novel, computationally-efficient tools for maintaining norm-constrained weight matrices. Applying these tools, we are able to train transformer models with Lipschitz bounds enforced throughout training. We find that optimizer dynamics matter: switching from AdamW to Muon improves standard methods—weight decay and spectral normalization—allowing models to reach equal performance with a lower Lipschitz bound. Inspired by Muon’s update having a fixed spectral norm, we co-design a weight constraint method that improves the Lipschitz vs. performance tradeoff on MLPs and 2M parameter transformers. Our 2-Lipschitz transformer on Shakespeare text reaches validation accuracy 60%. Scaling to 145M parameters, our 10-Lipschitz transformer reaches 21% accuracy on internet text. However, to match the NanoGPT baseline validation accuracy of 39.4%, our Lipschitz upper bound increases to 10^264. Nonetheless, our Lipschitz transformers train without stability measures such as layer norm, QK norm, and logit tanh softcapping.
+## Related Resources
+- **Main Codebase**: [https://github.com/Arongil/lipschitz-transformers](https://github.com/Arongil/lipschitz-transformers) - Full implementation and training scripts
+- **Paper**: [TO BE ADDED] - Detailed methodology and theoretical analysis
+## Citation
+If you use this data or these models in your research, please cite our paper:
+```bibtex
+[CITATION TO BE ADDED]
+```
+## License
+This repository is released under the MIT License. The data and models are provided for research purposes.