Update README.md
Browse files
README.md
CHANGED
|
@@ -37,12 +37,12 @@ pipeline_tag: text-generation
|
|
| 37 |
|
| 38 |
This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
|
| 39 |
|
|
|
|
| 40 |
* **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
|
| 41 |
-
* **[Llama-3.1-8B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)**
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
-
Our **Llama-3.
|
| 46 |
|
| 47 |
---
|
| 48 |
### Benchmarks 1B, 3B
|
|
@@ -50,8 +50,8 @@ Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with cont
|
|
| 50 |
| --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
|
| 51 |
| **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
|
| 52 |
| **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
|
| 53 |
-
| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 |
|
| 54 |
-
| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** |
|
| 55 |
|
| 56 |
|
| 57 |
### Benchmarks 8B
|
|
@@ -60,8 +60,8 @@ Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with cont
|
|
| 60 |
| **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
|
| 61 |
| **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
|
| 62 |
| **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
|
| 63 |
-
| **[
|
| 64 |
-
| **[
|
| 65 |
|
| 66 |
|
| 67 |
The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
|
|
@@ -70,7 +70,7 @@ We’re eager to see how these models will contribute to Uzbek open-source and b
|
|
| 70 |
|
| 71 |
## How to use
|
| 72 |
|
| 73 |
-
The Llama-3.
|
| 74 |
|
| 75 |
|
| 76 |
### Use with transformers
|
|
@@ -82,7 +82,7 @@ import langid
|
|
| 82 |
|
| 83 |
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 84 |
DTYPE = torch.bfloat16
|
| 85 |
-
MODEL_ID = "beruniy/Llama-3.
|
| 86 |
PATTERN = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
|
| 87 |
|
| 88 |
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
|
|
@@ -170,5 +170,5 @@ template = "Given the above question and choices, choose the single best answer
|
|
| 170 |
|
| 171 |
## More
|
| 172 |
For more details and examples, refer to the base model below:
|
| 173 |
-
https://huggingface.co/meta-llama/Llama-3.
|
| 174 |
|
|
|
|
| 37 |
|
| 38 |
This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
|
| 39 |
|
| 40 |
+
* **[Llama-3.2-1B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-Uz)**
|
| 41 |
* **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
|
|
|
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
+
Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with context length of 4096 tokens, on 3.6B tokens (67% English, 33% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text.
|
| 46 |
|
| 47 |
---
|
| 48 |
### Benchmarks 1B, 3B
|
|
|
|
| 50 |
| --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
|
| 51 |
| **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
|
| 52 |
| **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
|
| 53 |
+
| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | 70.60 | 52.04 |
|
| 54 |
+
| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | 25.19 | 14.66 | 85.08 | 86.82 | 81.64 | 41.56 | 45.91 |
|
| 55 |
|
| 56 |
|
| 57 |
### Benchmarks 8B
|
|
|
|
| 60 |
| **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
|
| 61 |
| **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
|
| 62 |
| **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
|
| 63 |
+
| **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | 29.29 | 15.25 | 86.79 | 87.42 | 81.70 | 63.97 | 53.33 |
|
| 64 |
+
<!-- | **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -->
|
| 65 |
|
| 66 |
|
| 67 |
The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
|
|
|
|
| 70 |
|
| 71 |
## How to use
|
| 72 |
|
| 73 |
+
The Llama-3.1-8B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
|
| 74 |
|
| 75 |
|
| 76 |
### Use with transformers
|
|
|
|
| 82 |
|
| 83 |
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 84 |
DTYPE = torch.bfloat16
|
| 85 |
+
MODEL_ID = "beruniy/Llama-3.1-8B-Instruct-uz"
|
| 86 |
PATTERN = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
|
| 87 |
|
| 88 |
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
|
|
|
|
| 170 |
|
| 171 |
## More
|
| 172 |
For more details and examples, refer to the base model below:
|
| 173 |
+
https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
|
| 174 |
|