Update README.md
Browse files
README.md
CHANGED
|
@@ -10,20 +10,20 @@ Implementation of the paper "How Many Layers and Why? An Analysis of the Model D
|
|
| 10 |
|
| 11 |
## Model architecture
|
| 12 |
|
| 13 |
-
We augment a multi-layer transformer encoder with a halting mechanism, which
|
| 14 |
We directly adapted this mechanism from Graves ([2016](#graves-2016)). At each iteration, we compute a probability for each token to stop updating its state.
|
| 15 |
|
| 16 |
## Model use
|
| 17 |
|
| 18 |
-
The architecture is not yet directly included in the Transformers library. So you
|
| 19 |
|
| 20 |
```bash
|
| 21 |
pip install git+https://github.com/AntoineSimoulin/adaptive-depth-transformers
|
| 22 |
```
|
| 23 |
|
| 24 |
-
Then
|
| 25 |
|
| 26 |
-
```
|
| 27 |
import sys
|
| 28 |
sys.path.append('adaptative-depth-transformers')
|
| 29 |
|
|
@@ -41,7 +41,6 @@ outputs.updates
|
|
| 41 |
# tensor([[[[15., 9., 10., 7., 3., 8., 5., 7., 12., 10., 6., 8., 8., 9., 5., 8.]]]])
|
| 42 |
```
|
| 43 |
|
| 44 |
-
|
| 45 |
## Citations
|
| 46 |
|
| 47 |
### BibTeX entry and citation info
|
|
|
|
| 10 |
|
| 11 |
## Model architecture
|
| 12 |
|
| 13 |
+
We augment a multi-layer transformer encoder with a halting mechanism, which dynamically adjusts the number of layers for each token.
|
| 14 |
We directly adapted this mechanism from Graves ([2016](#graves-2016)). At each iteration, we compute a probability for each token to stop updating its state.
|
| 15 |
|
| 16 |
## Model use
|
| 17 |
|
| 18 |
+
The architecture is not yet directly included in the Transformers library. The code used for pre-training is available in the following [github repository](https://github.com/AntoineSimoulin/adaptive-depth-transformers). So you should install the code implementation first:
|
| 19 |
|
| 20 |
```bash
|
| 21 |
pip install git+https://github.com/AntoineSimoulin/adaptive-depth-transformers
|
| 22 |
```
|
| 23 |
|
| 24 |
+
Then you can use the model directly.
|
| 25 |
|
| 26 |
+
```python
|
| 27 |
import sys
|
| 28 |
sys.path.append('adaptative-depth-transformers')
|
| 29 |
|
|
|
|
| 41 |
# tensor([[[[15., 9., 10., 7., 3., 8., 5., 7., 12., 10., 6., 8., 8., 9., 5., 8.]]]])
|
| 42 |
```
|
| 43 |
|
|
|
|
| 44 |
## Citations
|
| 45 |
|
| 46 |
### BibTeX entry and citation info
|