Add model card
Browse filesThis PR adds a model card, linking to the paper and Github repository. It also adds a link to the project page and populates the relevant metadata, including the pipeline tag, so that the model can be discovered more easily.
README.md
CHANGED
|
@@ -1,10 +1,12 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
The *TokenFormer* is a **fully attention-based architecture**
|
| 6 |
that unifies the computations of token-token and token-parameter interactions
|
| 7 |
-
by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://
|
| 8 |
It contains four models of sizes
|
| 9 |
150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
|
| 10 |
All 4 model sizes are trained on the exact
|
|
@@ -19,8 +21,8 @@ same data, in the exact same order.
|
|
| 19 |
- Language: English
|
| 20 |
- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
|
| 21 |
for training procedure, config files, and details on how to use.
|
| 22 |
-
[See paper](https://
|
| 23 |
-
details.
|
| 24 |
- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
|
| 25 |
- License: Apache 2.0
|
| 26 |
- Contact: to ask questions about this model, please email Haiyang Wang.
|
|
@@ -92,5 +94,4 @@ TokenFormer compared with Opensource Transformer-based LLMs.
|
|
| 92 |
| Pythia | 2.8B | 64.7 | 59.3 | 74.0 | 64.1 | 32.9 | 59.7 | 59.1 |
|
| 93 |
| **TokenFormer** | 1.5B | **64.7** | 60.0 | **74.8** | **64.8** | 32.0 | 59.7 | **59.3** |
|
| 94 |
<figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
|
| 95 |
-
</figure>
|
| 96 |
-
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
---
|
| 6 |
|
| 7 |
The *TokenFormer* is a **fully attention-based architecture**
|
| 8 |
that unifies the computations of token-token and token-parameter interactions
|
| 9 |
+
by entirely employing the attention mechanism, **maximizes the flexibility of neural network**. [(see paper)](https://huggingface.co/papers/2410.23168).
|
| 10 |
It contains four models of sizes
|
| 11 |
150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
|
| 12 |
All 4 model sizes are trained on the exact
|
|
|
|
| 21 |
- Language: English
|
| 22 |
- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
|
| 23 |
for training procedure, config files, and details on how to use.
|
| 24 |
+
[See paper](https://huggingface.co/papers/2410.23168) for more evals and implementation
|
| 25 |
+
details. Also see the [project page](https://haiyang-w.github.io/tokenformer.github.io/).
|
| 26 |
- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
|
| 27 |
- License: Apache 2.0
|
| 28 |
- Contact: to ask questions about this model, please email Haiyang Wang.
|
|
|
|
| 94 |
| Pythia | 2.8B | 64.7 | 59.3 | 74.0 | 64.1 | 32.9 | 59.7 | 59.1 |
|
| 95 |
| **TokenFormer** | 1.5B | **64.7** | 60.0 | **74.8** | **64.8** | 32.0 | 59.7 | **59.3** |
|
| 96 |
<figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
|
| 97 |
+
</figure>
|
|
|