Add model card

This PR adds a model card, linking to the paper and Github repository. It also adds a link to the project page and populates the relevant metadata, including the pipeline tag, so that the model can be discovered more easily.

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -1,10 +1,12 @@
 ---
 license: apache-2.0
 ---
 The *TokenFormer* is a **fully attention-based architecture**
 that unifies the computations of token-token and token-parameter interactions
-by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://arxiv.org/pdf/2410.23168).
 It contains four models of sizes
 150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
 All 4 model sizes are trained on the exact
@@ -19,8 +21,8 @@ same data, in the exact same order.
 - Language: English
 - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
  for training procedure, config files, and details on how to use.
- [See paper](https://arxiv.org/pdf/2410.23168) for more evals and implementation
- details.
 - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
 - License: Apache 2.0
 - Contact: to ask questions about this model, please email Haiyang Wang.
@@ -92,5 +94,4 @@ TokenFormer compared with Opensource Transformer-based LLMs.
 | Pythia       |    2.8B  | 64.7     |     59.3  | 74.0 |  64.1  | 32.9  |  59.7      |   59.1   |
 | **TokenFormer**  |    1.5B  | **64.7**     |     60.0  | **74.8** |  **64.8**  | 32.0  |  59.7      |   **59.3**   |
 <figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
-</figure>

 ---
 license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 ---
 The *TokenFormer* is a **fully attention-based architecture**
 that unifies the computations of token-token and token-parameter interactions
+by entirely employing the attention mechanism, **maximizes the flexibility of neural network**. [(see paper)](https://huggingface.co/papers/2410.23168).
 It contains four models of sizes
 150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
 All 4 model sizes are trained on the exact
 - Language: English
 - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
  for training procedure, config files, and details on how to use.
+ [See paper](https://huggingface.co/papers/2410.23168) for more evals and implementation
+ details. Also see the [project page](https://haiyang-w.github.io/tokenformer.github.io/).
 - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
 - License: Apache 2.0
 - Contact: to ask questions about this model, please email Haiyang Wang.
 | Pythia       |    2.8B  | 64.7     |     59.3  | 74.0 |  64.1  | 32.9  |  59.7      |   59.1   |
 | **TokenFormer**  |    1.5B  | **64.7**     |     60.0  | **74.8** |  **64.8**  | 32.0  |  59.7      |   **59.3**   |
 <figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
+</figure>