phess2 commited on
Commit
6afb1d8
·
verified ·
1 Parent(s): 0624b71

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +77 -3
README.md CHANGED
@@ -1,3 +1,77 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Lipschitz Bounded Neural Networks: Figure Data and Model Checkpoints
2
+
3
+ This repository contains the figure data and model checkpoints supporting our paper on training Lipschitz bounded neural networks. The repository provides reproducible data for key figures and pre-trained models that demonstrate the effectiveness of various constraint methods.
4
+
5
+ ## Repository Structure
6
+
7
+ ```
8
+ ├── figures/ # Figure data and reproduction scripts
9
+ │ ├── figure_2/ # Data for Figure 2
10
+ │ ├── figure_3/ # Data for Figure 3
11
+ │ ├── figure_4/ # Data for Figure 4
12
+ │ ├── reproduce_figures.py # Script to reproduce figures from CSV data
13
+ │ ├── requirements.txt # Python dependencies
14
+ │ └── README.md # Detailed usage instructions
15
+ └── models/ # Pre-trained model checkpoints
16
+ ├── MLPs/ # MLP models trained on ImageNet 1K
17
+ └── transformers/ # Transformer models trained on Shakespeare
18
+ ```
19
+
20
+ ## Figure Data
21
+
22
+ The `figures/` directory contains CSV files with the processed data used to create each figure in our paper, along with scripts to reproduce the exact plots. This enables full reproducibility of our experimental results.
23
+
24
+ ## Model Checkpoints
25
+
26
+ ### MLP Models (`models/MLPs/`)
27
+ Two MLP models trained on ImageNet 1K that were used to generate **Figure 3** in our paper.
28
+
29
+ **Training Details:**
30
+ - Dataset: ImageNet 1K
31
+ - Purpose: Figure 3 generation and analysis
32
+ - Architecture: Multi-layer perceptrons with spectral constraints
33
+
34
+ ### Transformer Models (`models/transformers/`)
35
+ Best-performing transformer model trained on the Shakespeare word dataset, representing our optimal validation accuracy checkpoint.
36
+
37
+ **Training Details:**
38
+ - Dataset: Shakespeare word-level dataset
39
+ - Selection: Best validation accuracy checkpoint
40
+ - Architecture: Transformer with Lipschitz constraints
41
+
42
+ ## Usage
43
+
44
+ ### Reproducing Figures
45
+ 1. Install dependencies:
46
+ ```bash
47
+ pip install -r figures/requirements.txt
48
+ ```
49
+
50
+ 2. Run the reproduction script:
51
+ ```bash
52
+ python figures/reproduce_figures.py
53
+ ```
54
+
55
+ ### Loading Model Checkpoints
56
+ The model checkpoints can be loaded using the main codebase. For detailed instructions on model loading and usage, please refer to the main project repository: [https://github.com/Arongil/lipschitz-transformers](https://github.com/Arongil/lipschitz-transformers)
57
+
58
+ ## Paper Abstract
59
+
60
+ Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and overfitting. To combat these problems, past research has looked at building neural networks entirely from Lipschitz components. However, these techniques have not matured to the point where researchers have trained a modern architecture such as a transformer with a Lipschitz certificate enforced beyond initialization. To explore this gap, we begin by developing and benchmarking novel, computationally-efficient tools for maintaining norm-constrained weight matrices. Applying these tools, we are able to train transformer models with Lipschitz bounds enforced throughout training. We find that optimizer dynamics matter: switching from AdamW to Muon improves standard methods—weight decay and spectral normalization—allowing models to reach equal performance with a lower Lipschitz bound. Inspired by Muon’s update having a fixed spectral norm, we co-design a weight constraint method that improves the Lipschitz vs. performance tradeoff on MLPs and 2M parameter transformers. Our 2-Lipschitz transformer on Shakespeare text reaches validation accuracy 60%. Scaling to 145M parameters, our 10-Lipschitz transformer reaches 21% accuracy on internet text. However, to match the NanoGPT baseline validation accuracy of 39.4%, our Lipschitz upper bound increases to 10^264. Nonetheless, our Lipschitz transformers train without stability measures such as layer norm, QK norm, and logit tanh softcapping.
61
+
62
+ ## Related Resources
63
+
64
+ - **Main Codebase**: [https://github.com/Arongil/lipschitz-transformers](https://github.com/Arongil/lipschitz-transformers) - Full implementation and training scripts
65
+ - **Paper**: [TO BE ADDED] - Detailed methodology and theoretical analysis
66
+
67
+ ## Citation
68
+
69
+ If you use this data or these models in your research, please cite our paper:
70
+
71
+ ```bibtex
72
+ [CITATION TO BE ADDED]
73
+ ```
74
+
75
+ ## License
76
+
77
+ This repository is released under the MIT License. The data and models are provided for research purposes.