Update README.md
Browse files
README.md
CHANGED
|
@@ -66,15 +66,18 @@ model-index:
|
|
| 66 |
|
| 67 |
</div>
|
| 68 |
|
| 69 |
-
|
| 70 |
-
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
| 71 |
-
</p>
|
| 72 |
|
| 73 |
## TLDR
|
| 74 |
This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
|
| 75 |
|
| 76 |
-
LongLLaMA is built upon the foundation of [
|
| 77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
## Overview
|
| 80 |
|
|
@@ -84,7 +87,7 @@ LongLLaMA Code is built upon the foundation of [Code Llama](https://huggingface.
|
|
| 84 |
|
| 85 |
**LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
|
| 86 |
with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
|
| 87 |
-
**LongLLaMA
|
| 88 |
|
| 89 |
|
| 90 |
<div align="center">
|
|
@@ -159,9 +162,9 @@ LongLLaMA has several other parameters:
|
|
| 159 |
import torch
|
| 160 |
from transformers import LlamaTokenizer, AutoModelForCausalLM
|
| 161 |
|
| 162 |
-
tokenizer = LlamaTokenizer.from_pretrained("syzymon/
|
| 163 |
model = AutoModelForCausalLM.from_pretrained(
|
| 164 |
-
"syzymon/
|
| 165 |
mem_layers=[],
|
| 166 |
mem_dtype='bfloat16',
|
| 167 |
trust_remote_code=True,
|
|
@@ -177,8 +180,8 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 177 |
from transformers import LlamaTokenizer, LlamaForCausalLM
|
| 178 |
import torch
|
| 179 |
|
| 180 |
-
tokenizer = LlamaTokenizer.from_pretrained("syzymon/
|
| 181 |
-
model = LlamaForCausalLM.from_pretrained("syzymon/
|
| 182 |
```
|
| 183 |
|
| 184 |
|
|
|
|
| 66 |
|
| 67 |
</div>
|
| 68 |
|
| 69 |
+
|
|
|
|
|
|
|
| 70 |
|
| 71 |
## TLDR
|
| 72 |
This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
|
| 73 |
|
| 74 |
+
LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
|
| 75 |
+
|
| 76 |
+
LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4%**.
|
| 77 |
+
|
| 78 |
+
<p align="center" width="100%">
|
| 79 |
+
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
| 80 |
+
</p>
|
| 81 |
|
| 82 |
## Overview
|
| 83 |
|
|
|
|
| 87 |
|
| 88 |
**LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
|
| 89 |
with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
|
| 90 |
+
**LongLLaMA-Code** is a [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf) model finetuned with the FoT method.
|
| 91 |
|
| 92 |
|
| 93 |
<div align="center">
|
|
|
|
| 162 |
import torch
|
| 163 |
from transformers import LlamaTokenizer, AutoModelForCausalLM
|
| 164 |
|
| 165 |
+
tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
|
| 166 |
model = AutoModelForCausalLM.from_pretrained(
|
| 167 |
+
"syzymon/long_llama_code_7b", torch_dtype=torch.float32,
|
| 168 |
mem_layers=[],
|
| 169 |
mem_dtype='bfloat16',
|
| 170 |
trust_remote_code=True,
|
|
|
|
| 180 |
from transformers import LlamaTokenizer, LlamaForCausalLM
|
| 181 |
import torch
|
| 182 |
|
| 183 |
+
tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
|
| 184 |
+
model = LlamaForCausalLM.from_pretrained("syzymon/long_llama_code_7b", torch_dtype=torch.float32)
|
| 185 |
```
|
| 186 |
|
| 187 |
|