NamrataThakur
/

Small_Language_Model_MHA_53M_Pretrained

Text Generation

multi-head-attention

small-language-model

Model card Files Files and versions

NamrataThakur commited on Mar 1

Commit

e819bec

·

verified ·

1 Parent(s): a9b31d7

Update README.md

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -70,6 +70,27 @@ chainlit run app_pretrain.py
 This will launch a web application where you can input text and see the model's generated responses.
 ## Model Architecture and Objective
 Stories-SLM uses a standard GPT decoder-only transformer architecture with:

 This will launch a web application where you can input text and see the model's generated responses.
+### Downloading from Huggingface 🤗
+To interact with the model by downloading from huggingface:
+- First clone the repo in the local
+```bash
+from transformer_blocks.gpt2 import GPT2
+from gpt_Pretraining.text_generation import Text_Generation
+model = GPT2.from_pretrained("NamrataThakur/Small_Language_Model_MHA_53M_Pretrained")
+model.eval()
+#---------------------------- Checking the generation to make everything is okay ---------------------------
+generation = Text_Generation(model=model, device='cpu', tokenizer_model='gpt2',
+                                          arch_type='original')
+start_context = "One day, a "
+response = generation.text_generation(input_text=start_context, max_new_tokens = 160, temp = 0.5, top_k=10, kv_cache=False)
+print(response)
+```
 ## Model Architecture and Objective
 Stories-SLM uses a standard GPT decoder-only transformer architecture with: