Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -37,12 +37,12 @@ pipeline_tag: text-generation
|
|
| 37 |
|
| 38 |
This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
|
| 39 |
|
| 40 |
-
* **[Llama-3.2-1B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-Uz)**
|
| 41 |
* **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
|
|
|
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
-
Our **Llama-3.
|
| 46 |
|
| 47 |
---
|
| 48 |
### Benchmarks 1B, 3B
|
|
@@ -50,8 +50,8 @@ Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with cont
|
|
| 50 |
| --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
|
| 51 |
| **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
|
| 52 |
| **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
|
| 53 |
-
| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | 70.60 | 52.04 |
|
| 54 |
-
| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | 25.19 | 14.66 | 85.08 | 86.82 | 81.64 | 41.56 | 45.91 |
|
| 55 |
|
| 56 |
|
| 57 |
### Benchmarks 8B
|
|
@@ -60,8 +60,8 @@ Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with cont
|
|
| 60 |
| **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
|
| 61 |
| **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
|
| 62 |
| **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
|
| 63 |
-
| **[
|
| 64 |
-
|
| 65 |
|
| 66 |
|
| 67 |
The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
|
|
@@ -70,7 +70,7 @@ We’re eager to see how these models will contribute to Uzbek open-source and b
|
|
| 70 |
|
| 71 |
## How to use
|
| 72 |
|
| 73 |
-
The Llama-3.
|
| 74 |
|
| 75 |
|
| 76 |
### Use with transformers
|
|
@@ -82,7 +82,7 @@ import langid
|
|
| 82 |
|
| 83 |
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 84 |
DTYPE = torch.bfloat16
|
| 85 |
-
MODEL_ID = "beruniy/Llama-3.
|
| 86 |
PATTERN = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
|
| 87 |
|
| 88 |
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
|
|
@@ -170,5 +170,5 @@ template = "Given the above question and choices, choose the single best answer
|
|
| 170 |
|
| 171 |
## More
|
| 172 |
For more details and examples, refer to the base model below:
|
| 173 |
-
https://huggingface.co/meta-llama/Llama-3.
|
| 174 |
|
|
|
|
| 37 |
|
| 38 |
This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
|
| 39 |
|
|
|
|
| 40 |
* **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
|
| 41 |
+
* **[Llama-3.1-8B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)**
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
+
Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with context length of 2048 tokens, on 2.4B tokens (75% English, 25% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text. You’ll be able to run this model on just 2 GB of VRAM (with quantization), perfect for small GPUs, edge devices, or even mobile scenarios.
|
| 46 |
|
| 47 |
---
|
| 48 |
### Benchmarks 1B, 3B
|
|
|
|
| 50 |
| --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
|
| 51 |
| **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
|
| 52 |
| **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
|
| 53 |
+
| **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | **70.60** | **52.04** |
|
| 54 |
+
| **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | **25.19** | **14.66** | **85.08** | **86.82** | **81.64** | 41.56 | 45.91 |
|
| 55 |
|
| 56 |
|
| 57 |
### Benchmarks 8B
|
|
|
|
| 60 |
| **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
|
| 61 |
| **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
|
| 62 |
| **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
|
| 63 |
+
| **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 64 |
+
| **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | **0** | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 65 |
|
| 66 |
|
| 67 |
The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
|
|
|
|
| 70 |
|
| 71 |
## How to use
|
| 72 |
|
| 73 |
+
The Llama-3.2-1B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
|
| 74 |
|
| 75 |
|
| 76 |
### Use with transformers
|
|
|
|
| 82 |
|
| 83 |
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 84 |
DTYPE = torch.bfloat16
|
| 85 |
+
MODEL_ID = "beruniy/Llama-3.2-1B-Instruct-uz"
|
| 86 |
PATTERN = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
|
| 87 |
|
| 88 |
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
|
|
|
|
| 170 |
|
| 171 |
## More
|
| 172 |
For more details and examples, refer to the base model below:
|
| 173 |
+
https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
|
| 174 |
|
model-00001-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4976706864
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:be3123ed34cecd9b7ecc42e6c7f7c0c583a0b4cb14c5b2897cbc660d76e88c1d
|
| 3 |
size 4976706864
|
model-00002-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4999802720
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7cc831ada8f495ea3ec8194603e9d5f696180e02f3b262eb7004e1e293265591
|
| 3 |
size 4999802720
|
model-00003-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4915916176
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:489d233c856d12525760dff36c1bd0fa34995757d9d2b7114da8f4227d44a08b
|
| 3 |
size 4915916176
|
model-00004-of-00004.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1168147000
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d9ec981a088b776c708679845ecaa85607c34053c7d20960cf05d65d5b6eca5e
|
| 3 |
size 1168147000
|