hazyresearch
/

based-1b-50b

@@ -3,10 +3,27 @@ datasets:
 - EleutherAI/pile
 language:
 - en
 ---
 # Model Card
-This model is pretrained Based model.
 As a quality reference, we include a pretrained Mamba model provided here: https://huggingface.co/hazyresearch/mamba-1b-50b and a pretrained attention (Llama architecture) model provided here: https://huggingface.co/hazyresearch/attn-1b-50bn
@@ -29,17 +46,45 @@ We include a series of benchmarks that you can use to evaluate quality:
 - SQUAD: https://huggingface.co/datasets/hazyresearch/based-squad
-## Citation
-Please consider citing this paper if you use our work:
 ```
-@article{arora2024simple,
-  title={Simple linear attention language models balance the recall-throughput tradeoff},
-  author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
-  journal={arXiv:2402.18668},
-  year={2024}
-}
 ```
-Please reach out to simarora@stanford.edu, eyuboglu@stanford.edu, and mzhang20@stanford.edu with questions.

 - EleutherAI/pile
 language:
 - en
+library_name: transformers
+pipeline_tag: text-generation
+license: mit
 ---
 # Model Card
+## Citation
+Please consider citing this paper if you use our work:
+```
+@article{arora2024simple,
+  title={Simple linear attention language models balance the recall-throughput tradeoff},
+  author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
+  journal={arXiv:2402.18668},
+  year={2024}
+}
+```
+This model is a pretrained Based model.
 As a quality reference, we include a pretrained Mamba model provided here: https://huggingface.co/hazyresearch/mamba-1b-50b and a pretrained attention (Llama architecture) model provided here: https://huggingface.co/hazyresearch/attn-1b-50bn
 - SQUAD: https://huggingface.co/datasets/hazyresearch/based-squad
+Please reach out to simarora@stanford.edu, eyuboglu@stanford.edu, and mzhang20@stanford.edu with questions.
+Use the code below to load the Based checkpoints:
+```python
+import torch
+from transformers import AutoTokenizer
+from based.models.gpt import GPTLMHeadModel
+tokenizer = AutoTokenizer.from_pretrained("gpt2")
+model = GPTLMHeadModel.from_pretrained_hf("hazyresearch/based-360m")
 ```
+The following code will run text generation for a prompt and print out the response.
+```python
+input = tokenizer.encode("If I take one more step, it will be", return_tensors="pt").to("cuda")
+output = model.generate(input, max_length=20)
+print(tokenizer.decode(output[0]))
 ```
+**Note.** For the checkpoints from other models, you will need to install other dependencies and use slightly different code.
+To load the Attention models, use the following code:
+```python
+import torch
+from transformers import AutoTokenizer
+from based.models.transformer.gpt import GPTLMHeadModel
+tokenizer = AutoTokenizer.from_pretrained("gpt2")
+model = GPTLMHeadModel.from_pretrained_hf("hazyresearch/attn-360m").to("cuda")
+```
+To use the Mamba checkpoints, first run `pip install mamba-ssm` and then use the following code:
+```python
+import torch
+from transformers import AutoTokenizer
+from based.models.mamba import MambaLMHeadModel
+tokenizer = AutoTokenizer.from_pretrained("gpt2")
+model = MambaLMHeadModel.from_pretrained_hf("hazyresearch/mamba-360m").to("cuda")
+```