YuWangX
/

memoryllm-8b-chat

Model card Files Files and versions

YuWangX commited on Aug 22, 2024

Commit

3ab81fe

·

verified ·

1 Parent(s): bac5710

Update README.md

Files changed (1) hide show

README.md +49 -3

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+This model is continually pre-trained from [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) with the structure proposed in [MemoryLLM](https://arxiv.org/abs/2402.04624).
+To use the model, please use the following code:
+```
+git clone git@github.com:wangyu-ustc/MemoryLLM.git
+cd MemoryLLM
+```
+Then simply use the following code to load the model:
+```python
+from modeling_memoryllm import MemoryLLM
+from configuration_memoryllm import MemoryLLMConfig
+from transformers import AutoTokenizer
+model = MemoryLLM.from_pretrained("YuWangX/memoryllm-8b-chat")
+tokenizer = AutoTokenizer.from_pretrained("YuWangX/memoryllm-8b-chat")
+```
+```python
+### How to use the model
+Inject a piece of context into the model using the following script:
+```python
+model = model.cuda()
+# Self-Update with the new context
+ctx = "David likes eating apples."
+model.inject_memory(tokenizer(ctx, return_tensors='pt', add_special_tokens=False).input_ids.cuda(), update_memory=True)
+# Generation
+messages = [{
+    'role': 'user', "content": "What fruits does David like?",
+}]
+inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
+inputs = inputs[:, 1:] # remove bos token
+outputs = model.generate(input_ids=inputs.cuda(),
+                         max_new_tokens=20)
+response = tokenizer.decode(outputs[0])
+outputs = model.generate(inputs=input_ids.cuda(), attention_mask=attention_mask.cuda(), max_new_tokens=10)
+print(tokenizer.decode(outputs[0]))
+```
+```