Update README.md
Browse filesUpdates README (Fixes Grammer Errors)
README.md
CHANGED
|
@@ -16,14 +16,14 @@ This version of Hummingbird is only meant to demonstrate Efficient Attention for
|
|
| 16 |
|
| 17 |
## Model Details
|
| 18 |
|
| 19 |
-
The
|
| 20 |
|
| 21 |
| Parameter | size |
|
| 22 |
-
|
|
| 23 |
| # Transformer Blocks | 10 |
|
| 24 |
| Model Dimension | 3072 |
|
| 25 |
| # Heads | 1 |
|
| 26 |
|
| 27 |
|
| 28 |
-
The Attention Mechanism used is based on our newly proposed Efficient Attention from our paper, *You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism* ([arXiv:2403.01643](https://arxiv.org/abs/2403.01643)). We have chosen the number of
|
| 29 |
|
|
|
|
| 16 |
|
| 17 |
## Model Details
|
| 18 |
|
| 19 |
+
The model consists of 1.1 Billion parameters with the following specifications:
|
| 20 |
|
| 21 |
| Parameter | size |
|
| 22 |
+
| :------------------- | :--- |
|
| 23 |
| # Transformer Blocks | 10 |
|
| 24 |
| Model Dimension | 3072 |
|
| 25 |
| # Heads | 1 |
|
| 26 |
|
| 27 |
|
| 28 |
+
The Attention Mechanism used is based on our newly proposed Efficient Attention from our paper, *You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism* ([arXiv:2403.01643](https://arxiv.org/abs/2403.01643)). We have chosen the number of heads to be 1 as an interesting case study since all current LMs use multiple heads.
|
| 29 |
|