nielsr HF Staff commited on
Commit
a331ce8
·
verified ·
1 Parent(s): 006887e

Add model card

Browse files

This PR adds a model card, linking to the paper and Github repository. It also adds a link to the project page and populates the relevant metadata, including the pipeline tag, so that the model can be discovered more easily.

Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
 
5
  The *TokenFormer* is a **fully attention-based architecture**
6
  that unifies the computations of token-token and token-parameter interactions
7
- by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://arxiv.org/pdf/2410.23168).
8
  It contains four models of sizes
9
  150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
10
  All 4 model sizes are trained on the exact
@@ -19,8 +21,8 @@ same data, in the exact same order.
19
  - Language: English
20
  - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
21
  for training procedure, config files, and details on how to use.
22
- [See paper](https://arxiv.org/pdf/2410.23168) for more evals and implementation
23
- details.
24
  - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
25
  - License: Apache 2.0
26
  - Contact: to ask questions about this model, please email Haiyang Wang.
@@ -92,5 +94,4 @@ TokenFormer compared with Opensource Transformer-based LLMs.
92
  | Pythia | 2.8B | 64.7 | 59.3 | 74.0 | 64.1 | 32.9 | 59.7 | 59.1 |
93
  | **TokenFormer** | 1.5B | **64.7** | 60.0 | **74.8** | **64.8** | 32.0 | 59.7 | **59.3** |
94
  <figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
95
- </figure>
96
-
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  ---
6
 
7
  The *TokenFormer* is a **fully attention-based architecture**
8
  that unifies the computations of token-token and token-parameter interactions
9
+ by entirely employing the attention mechanism, **maximizes the flexibility of neural network**. [(see paper)](https://huggingface.co/papers/2410.23168).
10
  It contains four models of sizes
11
  150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
12
  All 4 model sizes are trained on the exact
 
21
  - Language: English
22
  - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
23
  for training procedure, config files, and details on how to use.
24
+ [See paper](https://huggingface.co/papers/2410.23168) for more evals and implementation
25
+ details. Also see the [project page](https://haiyang-w.github.io/tokenformer.github.io/).
26
  - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
27
  - License: Apache 2.0
28
  - Contact: to ask questions about this model, please email Haiyang Wang.
 
94
  | Pythia | 2.8B | 64.7 | 59.3 | 74.0 | 64.1 | 32.9 | 59.7 | 59.1 |
95
  | **TokenFormer** | 1.5B | **64.7** | 60.0 | **74.8** | **64.8** | 32.0 | 59.7 | **59.3** |
96
  <figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
97
+ </figure>