Update README.md

Browse files

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -37,12 +37,12 @@ pipeline_tag: text-generation
 This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
 * **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
-* **[Llama-3.1-8B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)**
 ---
-Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with context length of 2048 tokens, on 2.4B tokens (75% English, 25% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text. You’ll be able to run this model on just 2 GB of VRAM (with quantization), perfect for small GPUs, edge devices, or even mobile scenarios.
 ---
 ### Benchmarks 1B, 3B
@@ -50,8 +50,8 @@ Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with cont
 | --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
 | **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
 | **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
-| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | **70.60** | **52.04** |
-| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | **25.19** | **14.66** | **85.08** | **86.82** | **81.64** | 41.56 | 45.91 |
 ### Benchmarks 8B
@@ -60,8 +60,8 @@ Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with cont
 | **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
 | **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
 | **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
-| **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
-| **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | **0** | 0 | 0 | 0 | 0 | 0 | 0 |
 The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
@@ -70,7 +70,7 @@ We’re eager to see how these models will contribute to Uzbek open-source and b
 ## How to use
-The Llama-3.2-1B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
 ### Use with transformers
@@ -82,7 +82,7 @@ import langid
 DEVICE   = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 DTYPE    = torch.bfloat16
-MODEL_ID = "beruniy/Llama-3.2-1B-Instruct-uz"
 PATTERN  = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
 tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
@@ -170,5 +170,5 @@ template = "Given the above question and choices, choose the single best answer
 ## More
 For more details and examples, refer to the base model below:
-https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

 This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
+* **[Llama-3.2-1B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-Uz)**
 * **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
 ---
+Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with context length of 4096 tokens, on 3.6B tokens (67% English, 33% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text.
 ---
 ### Benchmarks 1B, 3B
 | --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
 | **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
 | **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
+| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | 70.60 | 52.04 |
+| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | 25.19 | 14.66 | 85.08 | 86.82 | 81.64 | 41.56 | 45.91 |
 ### Benchmarks 8B
 | **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
 | **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
 | **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
+| **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | 29.29 | 15.25 | 86.79 | 87.42 | 81.70 | 63.97 | 53.33 |
+<!-- | **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -->
 The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
 ## How to use
+The Llama-3.1-8B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
 ### Use with transformers
 DEVICE   = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 DTYPE    = torch.bfloat16
+MODEL_ID = "beruniy/Llama-3.1-8B-Instruct-uz"
 PATTERN  = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
 tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
 ## More
 For more details and examples, refer to the base model below:
+https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct