bxod commited on
Commit
145b85f
·
verified ·
1 Parent(s): 449f6f3

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -37,12 +37,12 @@ pipeline_tag: text-generation
37
 
38
  This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
39
 
40
- * **[Llama-3.2-1B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-Uz)**
41
  * **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
 
42
 
43
  ---
44
 
45
- Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with context length of 4096 tokens, on 3.6B tokens (67% English, 33% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text.
46
 
47
  ---
48
  ### Benchmarks 1B, 3B
@@ -50,8 +50,8 @@ Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with cont
50
  | --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
51
  | **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
52
  | **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
53
- | **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | 70.60 | 52.04 |
54
- | **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | 25.19 | 14.66 | 85.08 | 86.82 | 81.64 | 41.56 | 45.91 |
55
 
56
 
57
  ### Benchmarks 8B
@@ -60,8 +60,8 @@ Our **Llama-3.1-8B-Instruct-uz** model has been continually pretrained with cont
60
  | **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
61
  | **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
62
  | **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
63
- | **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | 29.29 | 15.25 | 86.79 | 87.42 | 81.70 | 63.97 | 53.33 |
64
- <!-- | **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -->
65
 
66
 
67
  The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
@@ -70,7 +70,7 @@ We’re eager to see how these models will contribute to Uzbek open-source and b
70
 
71
  ## How to use
72
 
73
- The Llama-3.1-8B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
74
 
75
 
76
  ### Use with transformers
@@ -82,7 +82,7 @@ import langid
82
 
83
  DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
84
  DTYPE = torch.bfloat16
85
- MODEL_ID = "beruniy/Llama-3.1-8B-Instruct-uz"
86
  PATTERN = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
87
 
88
  tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
@@ -170,5 +170,5 @@ template = "Given the above question and choices, choose the single best answer
170
 
171
  ## More
172
  For more details and examples, refer to the base model below:
173
- https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
174
 
 
37
 
38
  This is the 8B parameter version of our Uzbek-optimized Llama series. Also, check out our other models:
39
 
 
40
  * **[Llama-3.2-3B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)**
41
+ * **[Llama-3.1-8B-Instruct-Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)**
42
 
43
  ---
44
 
45
+ Our **Llama-3.2-1B-Instruct-uz** model has been continually pretrained with context length of 2048 tokens, on 2.4B tokens (75% English, 25% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text. You’ll be able to run this model on just 2 GB of VRAM (with quantization), perfect for small GPUs, edge devices, or even mobile scenarios.
46
 
47
  ---
48
  ### Benchmarks 1B, 3B
 
50
  | --------------------------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
51
  | **[Llama-3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
52
  | **[Llama-3.2 1B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-1B-Instruct-uz)** | 16.64 | 10.20 | 81.42 | 82.73 | 63.49 | 10.75 | 26.29 |
53
+ | **[Llama-3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)** | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | **70.60** | **52.04** |
54
+ | **[Llama-3.2 3B Instruct Uz](https://huggingface.co/beruniy/Llama-3.2-3B-Instruct-Uz)** | **25.19** | **14.66** | **85.08** | **86.82** | **81.64** | 41.56 | 45.91 |
55
 
56
 
57
  ### Benchmarks 8B
 
60
  | **[Llama-3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)** | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
61
  | **[Behbudiy Mistral 7B Uz](https://huggingface.co/behbudiy/Mistral-7B-Instruct-Uz)** | 28.09 | 15.96 | 86.26 | 88.42 | 83.41 | 55.51 | 47.09 |
62
  | **[Behbudiy Llama 8B Uz](https://huggingface.co/behbudiy/Llama-3.1-8B-Instruct-Uz)** | 27.08 | 13.29 | 84.76 | 85.62 | 81.66 | 68.22 | 59.18 |
63
+ | **[Behbudiy Nemo 12B Uz](https://huggingface.co/behbudiy/Mistral-Nemo-Instruct-Uz)** | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
64
+ | **[Llama-3.1 8B Instruct Uz](https://huggingface.co/beruniy/Llama-3.1-8B-Instruct-Uz)** | **0** | 0 | 0 | 0 | 0 | 0 | 0 |
65
 
66
 
67
  The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
 
70
 
71
  ## How to use
72
 
73
+ The Llama-3.2-1B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
74
 
75
 
76
  ### Use with transformers
 
82
 
83
  DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
84
  DTYPE = torch.bfloat16
85
+ MODEL_ID = "beruniy/Llama-3.2-1B-Instruct-uz"
86
  PATTERN = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
87
 
88
  tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
 
170
 
171
  ## More
172
  For more details and examples, refer to the base model below:
173
+ https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
174
 
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8195fc750476aad59a0426740760901b7866d97fa836221c293bdad65e514986
3
  size 4976706864
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be3123ed34cecd9b7ecc42e6c7f7c0c583a0b4cb14c5b2897cbc660d76e88c1d
3
  size 4976706864
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:23f3b378f4d3292cb30268fe9c9ddd6b4c7a9e3df4cb710634759961e75da8d0
3
  size 4999802720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cc831ada8f495ea3ec8194603e9d5f696180e02f3b262eb7004e1e293265591
3
  size 4999802720
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aaa5e485846647084849ef1783872bf3edc5ca43d27ee93e16506f53a0c681c7
3
  size 4915916176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:489d233c856d12525760dff36c1bd0fa34995757d9d2b7114da8f4227d44a08b
3
  size 4915916176
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5f4b6a3bafa5897b3ea8d55380a831372d594c7e3f2d0a252e6bf32228c711a7
3
  size 1168147000
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9ec981a088b776c708679845ecaa85607c34053c7d20960cf05d65d5b6eca5e
3
  size 1168147000