Update README.md
Browse files
README.md
CHANGED
|
@@ -14,20 +14,22 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
|
|
| 14 |
- This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
|
| 15 |
- The instruction model is trained on an improved instruction tuning dataset compared to v1
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
|
|
|
| 19 |
|
| 20 |
## License
|
| 21 |
-
-
|
|
|
|
| 22 |
|
| 23 |
## Datasets used for Fine-Tuning
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
| 28 |
- Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
|
| 29 |
|
| 30 |
-
|
| 31 |
- ESNLI - MIT
|
| 32 |
- ECQA - CDLA 1.0 - Sharing
|
| 33 |
- Strategy - MIT
|
|
@@ -35,7 +37,6 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
|
|
| 35 |
- gsmk8 - MIT
|
| 36 |
- aqua - MIT
|
| 37 |
- qasc - Apache 2.0
|
| 38 |
-
<br>
|
| 39 |
- Language Model, ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
|
| 40 |
- Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
|
| 41 |
|
|
@@ -46,6 +47,7 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
|
|
| 46 |
- Model Size: 7B parameters
|
| 47 |
- Dataset: Open-instruct
|
| 48 |
|
|
|
|
| 49 |
## Use in Transformers
|
| 50 |
|
| 51 |
```
|
|
@@ -77,9 +79,10 @@ output = tokenizer.decode(output1[0])
|
|
| 77 |
print(output)
|
| 78 |
|
| 79 |
```
|
| 80 |
-
### Output
|
| 81 |
|
| 82 |
|
|
|
|
|
|
|
| 83 |
Sure, I can help you with that!
|
| 84 |
|
| 85 |
Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
|
|
@@ -129,8 +132,11 @@ The output of the `attention_weights` function is a NumPy tensor that represents
|
|
| 129 |
I hope this helps!</s>
|
| 130 |
<hr>
|
| 131 |
|
|
|
|
| 132 |
## Finetuning details
|
| 133 |
The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
|
|
|
|
|
|
|
| 134 |
## Evaluation
|
| 135 |
|
| 136 |
-
|
|
|
|
| 14 |
- This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
|
| 15 |
- The instruction model is trained on an improved instruction tuning dataset compared to v1
|
| 16 |
|
| 17 |
+
**NOTE**: The model was trained using the Alpaca prompt template
|
| 18 |
+
**NOTE**: Fast tokenizer results in incorrect encoding, set the ```use_fast = False``` parameter, when instantiating the tokenizer
|
| 19 |
+
|
| 20 |
|
| 21 |
## License
|
| 22 |
+
- **Commercially Viable**
|
| 23 |
+
|
| 24 |
|
| 25 |
## Datasets used for Fine-Tuning
|
| 26 |
|
| 27 |
+
**Open-instruct**
|
| 28 |
|
| 29 |
+
**Open-instruct-v1**
|
| 30 |
- Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
|
| 31 |
|
| 32 |
+
**Subset of COT SUBMIX (FROM FLAN V2) Zeroshot examples**
|
| 33 |
- ESNLI - MIT
|
| 34 |
- ECQA - CDLA 1.0 - Sharing
|
| 35 |
- Strategy - MIT
|
|
|
|
| 37 |
- gsmk8 - MIT
|
| 38 |
- aqua - MIT
|
| 39 |
- qasc - Apache 2.0
|
|
|
|
| 40 |
- Language Model, ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
|
| 41 |
- Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
|
| 42 |
|
|
|
|
| 47 |
- Model Size: 7B parameters
|
| 48 |
- Dataset: Open-instruct
|
| 49 |
|
| 50 |
+
|
| 51 |
## Use in Transformers
|
| 52 |
|
| 53 |
```
|
|
|
|
| 79 |
print(output)
|
| 80 |
|
| 81 |
```
|
|
|
|
| 82 |
|
| 83 |
|
| 84 |
+
### Output
|
| 85 |
+
|
| 86 |
Sure, I can help you with that!
|
| 87 |
|
| 88 |
Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
|
|
|
|
| 132 |
I hope this helps!</s>
|
| 133 |
<hr>
|
| 134 |
|
| 135 |
+
|
| 136 |
## Finetuning details
|
| 137 |
The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
|
| 138 |
+
|
| 139 |
+
|
| 140 |
## Evaluation
|
| 141 |
|
| 142 |
+
**TODO**
|