Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +9 -9
model-00001-of-00004.safetensors +1 -1
model-00002-of-00004.safetensors +1 -1
model-00003-of-00004.safetensors +1 -1
model-00004-of-00004.safetensors +1 -1

README.md CHANGED Viewed

@@ -37,12 +37,12 @@ pipeline_tag: text-generation
 This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
-* **[Llama-3.2-1B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-Uz)**
 * **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
 ---
-Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with context length of 4096 tokens, on 3.6B tokens (67% English, 33% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text.
 ---
 ### Benchmarks 1B, 3B
@@ -50,8 +50,8 @@ Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with cont
 | --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
 | **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
 | **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
-| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | 70.60 | 52.04 |
-| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | 25.19 | 14.66 | 85.08 | 86.82 | 81.64 | 41.56 | 45.91 |
 ### Benchmarks 8B
@@ -60,8 +60,8 @@ Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with cont
 | **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
 | **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
 | **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
-| **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | 29.29 | 15.25 | 86.79 | 87.42 | 81.70 | 63.97 | 53.33 |
-<!-- | **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -->
 The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
@@ -70,7 +70,7 @@ We’re eager to see how these models will contribute to Uzbek open-source and b
 ## How to use
-The Llama-3.1-8B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
 ### Use with transformers
@@ -82,7 +82,7 @@ import langid
 DEVICE   = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 DTYPE    = torch.bfloat16
-MODEL_ID = "beruniy/Llama-3.1-8B-Instruct-uz"
 PATTERN  = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
 tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
@@ -170,5 +170,5 @@ template = "Given the above question and choices, choose the single best answer
 ## More
 For more details and examples, refer to the base model below:
-https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

 This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
 * **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
+* **[Llama-3.1-8B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)**
 ---
+Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with context length of 2048 tokens, on 2.4B tokens (75% English, 25% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text. You’ll be able to run this model on just 2 GB of VRAM (with quantization), perfect for small GPUs, edge devices, or even mobile scenarios.
 ---
 ### Benchmarks 1B, 3B
 | --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
 | **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
 | **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
+| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | **70.60** | **52.04** |
+| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | **25.19** | **14.66** | **85.08** | **86.82** | **81.64** | 41.56 | 45.91 |
 ### Benchmarks 8B
 | **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
 | **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
 | **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
+| **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | **0** | 0 | 0 | 0 | 0 | 0 | 0 |
 The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
 ## How to use
+The Llama-3.2-1B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
 ### Use with transformers
 DEVICE   = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 DTYPE    = torch.bfloat16
+MODEL_ID = "beruniy/Llama-3.2-1B-Instruct-uz"
 PATTERN  = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
 tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
 ## More
 For more details and examples, refer to the base model below:
+https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8195fc750476aad59a0426740760901b7866d97fa836221c293bdad65e514986
 size 4976706864

 version https://git-lfs.github.com/spec/v1
+oid sha256:be3123ed34cecd9b7ecc42e6c7f7c0c583a0b4cb14c5b2897cbc660d76e88c1d
 size 4976706864

model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:23f3b378f4d3292cb30268fe9c9ddd6b4c7a9e3df4cb710634759961e75da8d0
 size 4999802720

 version https://git-lfs.github.com/spec/v1
+oid sha256:7cc831ada8f495ea3ec8194603e9d5f696180e02f3b262eb7004e1e293265591
 size 4999802720

model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aaa5e485846647084849ef1783872bf3edc5ca43d27ee93e16506f53a0c681c7
 size 4915916176

 version https://git-lfs.github.com/spec/v1
+oid sha256:489d233c856d12525760dff36c1bd0fa34995757d9d2b7114da8f4227d44a08b
 size 4915916176

model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5f4b6a3bafa5897b3ea8d55380a831372d594c7e3f2d0a252e6bf32228c711a7
 size 1168147000

 version https://git-lfs.github.com/spec/v1
+oid sha256:d9ec981a088b776c708679845ecaa85607c34053c7d20960cf05d65d5b6eca5e
 size 1168147000