Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
- ko
|
| 7 |
+
license: cc-by-nc-4.0
|
| 8 |
+
tags:
|
| 9 |
+
- dnotitia
|
| 10 |
+
- nlp
|
| 11 |
+
- llm
|
| 12 |
+
- slm
|
| 13 |
+
- conversation
|
| 14 |
+
- chat
|
| 15 |
+
base_model:
|
| 16 |
+
- meta-llama/Meta-Llama-3.1-8B
|
| 17 |
+
library_name: transformers
|
| 18 |
+
pipeline_tag: text-generation
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
[](https://hf.co/QuantFactory)
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
# QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF
|
| 26 |
+
This is quantized version of [dnotitia/Llama-DNA-1.0-8B-Instruct](https://huggingface.co/dnotitia/Llama-DNA-1.0-8B-Instruct) created using llama.cpp
|
| 27 |
+
|
| 28 |
+
# Original Model Card
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
# DNA 1.0 8B Instruct
|
| 32 |
+
|
| 33 |
+
<p align="center">
|
| 34 |
+
<img src="assets/dna-logo.png" width="400" style="margin: 40px auto;">
|
| 35 |
+
</p>
|
| 36 |
+
|
| 37 |
+
**DNA 1.0 8B Instruct** is a <u>state-of-the-art (**SOTA**)</u> bilingual language model based on Llama architecture, specifically optimized for Korean language understanding and generation, while also maintaining strong English capabilities. The model was developed through a sophisticated process involving model merging via spherical linear interpolation (**SLERP**) with Llama 3.1 8B Instruct, and underwent knowledge distillation (**KD**) using Llama 3.1 405B as the teacher model. It was extensively trained through continual pre-training (**CPT**) with a high-quality Korean dataset. The training pipeline was completed with supervised fine-tuning (**SFT**) and direct preference optimization (**DPO**) to align with human preferences and enhance instruction-following abilities.
|
| 38 |
+
|
| 39 |
+
DNA 1.0 8B Instruct was fine-tuned on approximately 10B tokens of carefully curated data and has undergone extensive instruction tuning to enhance its ability to follow complex instructions and engage in natural conversations.
|
| 40 |
+
|
| 41 |
+
- **Developed by:** Dnotitia Inc.
|
| 42 |
+
- **Supported Languages:** Korean, English
|
| 43 |
+
- **Vocab Size:** 128,256
|
| 44 |
+
- **Context Length:** 131,072 tokens (128k)
|
| 45 |
+
- **License:** CC BY-NC 4.0
|
| 46 |
+
|
| 47 |
+
<div style="padding: 2px 8px; background-color: hsl(240, 100%, 50%, 0.1); border-radius: 5px">
|
| 48 |
+
<p><strong>NOTICE (Korean):</strong></p>
|
| 49 |
+
<p>๋ณธ ๋ชจ๋ธ์ ์์
์ ๋ชฉ์ ์ผ๋ก ํ์ฉํ์ค ์ ์์ต๋๋ค. ์์
์ ์ด์ฉ์ ์ํ์๋ ๊ฒฝ์ฐ, <a href="https://www.dnotitia.com/contact/post-form">Contact us</a>๋ฅผ ํตํด ๋ฌธ์ํด ์ฃผ์๊ธฐ ๋ฐ๋๋๋ค. ๊ฐ๋จํ ํ์ ์ ์ฐจ๋ฅผ ๊ฑฐ์ณ ์์
์ ํ์ฉ์ ์น์ธํด ๋๋ฆฌ๋๋ก ํ๊ฒ ์ต๋๋ค.</p>
|
| 50 |
+
<p>Try DNA-powered Mnemos Assistant! <a href="https://request-demo.dnotitia.ai/">Beta Open โ</a></p>
|
| 51 |
+
</div>
|
| 52 |
+
|
| 53 |
+
## Training Procedure
|
| 54 |
+
|
| 55 |
+
<p align="center">
|
| 56 |
+
<img src="assets/training-procedure.png" width="600" style="margin: 40px auto;">
|
| 57 |
+
</p>
|
| 58 |
+
|
| 59 |
+
## Evaluation
|
| 60 |
+
|
| 61 |
+
We evaluated DNA 1.0 8B Instruct against other prominent language models of similar size across various benchmarks, including Korean-specific tasks and general language understanding metrics. More details will be provided in the upcoming <u>Technical Report</u>.
|
| 62 |
+
|
| 63 |
+
| Language | Benchmark | **dnotitia/Llama-DNA-1.0-8B-Instruct** | LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct | yanolja/EEVE-Korean-Instruct-10.8B-v1.0 | Qwen/Qwen2.5-7B-Instruct | meta-llama/Llama-3.1-8B-Instruct | mistralai/Mistral-7B-Instruct-v0.3 | NCSOFT/Llama-VARCO-8B-Instruct | upstage/SOLAR-10.7B-Instruct-v1.0 |
|
| 64 |
+
|----------|------------|----------------------------------------|--------------------------------------|--------------------------------------|-----------------------------------------|--------------------------|----------------------------------|------------------------------------|--------------------------------|-----------------------------------|
|
| 65 |
+
| Korean | KMMLU | **53.26** (1st) | 45.30 | 45.28 | 42.17 | <u>45.66</u> | 41.66 | 31.45 | 38.49 | 41.50 |
|
| 66 |
+
| | KMMLU-hard | **29.46** (1st) | 23.17 | 20.78 | 19.25 | <u>24.78</u> | 20.49 | 17.86 | 19.83 | 20.61 |
|
| 67 |
+
| | KoBEST | **83.40** (1st) | 79.05 | 80.13 | <u>81.67</u> | 78.51 | 67.56 | 63.77 | 72.99 | 73.26 |
|
| 68 |
+
| | Belebele | **57.99** (1st) | 40.97 | 45.11 | 49.40 | <u>54.85</u> | 54.70 | 40.31 | 53.17 | 48.68 |
|
| 69 |
+
| | CSATQA | <u>43.32</u> (2nd) | 40.11 | 34.76 | 39.57 | **45.45** | 36.90 | 27.27 | 32.62 | 34.22 |
|
| 70 |
+
| English | MMLU | 66.64 (3rd) | 65.27 | 64.32 | 63.63 | **74.26** | <u>68.26</u> | 62.04 | 63.25 | 65.30 |
|
| 71 |
+
| | MMLU-Pro | **43.05** (1st) | 40.73 | 38.90 | 32.79 | <u>42.5</u> | 40.92 | 33.49 | 37.11 | 30.25 |
|
| 72 |
+
| | GSM8K | **80.52** (1st) | 65.96 | <u>80.06</u> | 56.18 | 75.74 | 75.82 | 49.66 | 64.14 | 69.22 |
|
| 73 |
+
- The *highest* *scores* are in **bold** form, and the *second*\-*highest* *scores* are <u>underlined</u>.
|
| 74 |
+
|
| 75 |
+
**Evaluation Protocol**
|
| 76 |
+
For easy reproduction of our evaluation results, we list the evaluation tools and settings used below:
|
| 77 |
+
|
| 78 |
+
| | Evaluation setting | Metric | Evaluation tool |
|
| 79 |
+
|------------|--------------------|-------------------------------------|-----------------|
|
| 80 |
+
| KMMLU | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
|
| 81 |
+
| KMMLU Hard | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
|
| 82 |
+
| KoBEST | 5-shot | macro\_avg / f1 | lm-eval-harness |
|
| 83 |
+
| Belebele | 0-shot | acc | lm-eval-harness |
|
| 84 |
+
| CSATQA | 0-shot | acc\_norm | lm-eval-harness |
|
| 85 |
+
| MMLU | 5-shot | macro\_avg / acc | lm-eval-harness |
|
| 86 |
+
| MMLU Pro | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
|
| 87 |
+
| GSM8K | 5-shot | acc, exact\_match & strict\_extract | lm-eval-harness |
|
| 88 |
+
|
| 89 |
+
## Quickstart
|
| 90 |
+
|
| 91 |
+
This model requires `transformers >= 4.43.0`.
|
| 92 |
+
|
| 93 |
+
```python
|
| 94 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
| 95 |
+
|
| 96 |
+
tokenizer = AutoTokenizer.from_pretrained('dnotitia/Llama-DNA-1.0-8B-Instruct')
|
| 97 |
+
model = AutoModelForCausalLM.from_pretrained('dnotitia/Llama-DNA-1.0-8B-Instruct', device_map='auto')
|
| 98 |
+
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
| 99 |
+
|
| 100 |
+
conversation = [
|
| 101 |
+
{"role": "system", "content": "You are a helpful assistant, Dnotitia DNA."},
|
| 102 |
+
{"role": "user", "content": "๋์ ์ด๋ฆ์?"},
|
| 103 |
+
]
|
| 104 |
+
inputs = tokenizer.apply_chat_template(conversation,
|
| 105 |
+
add_generation_prompt=True,
|
| 106 |
+
return_dict=True,
|
| 107 |
+
return_tensors="pt").to(model.device)
|
| 108 |
+
_ = model.generate(**inputs, streamer=streamer)
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Limitations
|
| 112 |
+
|
| 113 |
+
While DNA 1.0 8B Instruct demonstrates strong performance, users should be aware of the following limitations:
|
| 114 |
+
|
| 115 |
+
- The model may occasionally generate biased or inappropriate content
|
| 116 |
+
- Responses are based on training data and may not reflect current information
|
| 117 |
+
- The model may sometimes produce factually incorrect or inconsistent answers
|
| 118 |
+
- Performance may vary depending on the complexity and domain of the task
|
| 119 |
+
- Generated content should be reviewed for accuracy and appropriateness
|
| 120 |
+
|
| 121 |
+
## License
|
| 122 |
+
|
| 123 |
+
This model is released under CC BY-NC 4.0 license. For commercial usage inquiries, please [Contact us](https://www.dnotitia.com/contact/post-form).
|
| 124 |
+
|
| 125 |
+
## Appendix
|
| 126 |
+
|
| 127 |
+
- KMMLU scores comparison chart:
|
| 128 |
+
<img src="assets/comparison-chart.png" width="100%" style="margin: 40px auto;">
|
| 129 |
+
|
| 130 |
+
- DNA 1.0 8B Instruct model architecture <sup>[1]</sup>:
|
| 131 |
+
<img src="assets/model-architecture.png" width="500" style="margin: 40px auto;">
|
| 132 |
+
|
| 133 |
+
[1]: <https://www.linkedin.com/posts/sebastianraschka_the-llama-32-1b-and-3b-models-are-my-favorite-activity-7248317830943686656-yyYD/>
|
| 134 |
+
|
| 135 |
+
- The median percentage of modelโs weight difference between before and after the merge (our SFT model + Llama 3.1 8B Instruct):
|
| 136 |
+
<img src="assets/ours-vs-merged.png" width="100%" style="margin: 40px auto;">
|
| 137 |
+
|
| 138 |
+
## Citation
|
| 139 |
+
|
| 140 |
+
If you use or discuss this model in your academic research, please cite the project to help spread awareness:
|
| 141 |
+
|
| 142 |
+
```
|
| 143 |
+
@article{dnotitiadna2024,
|
| 144 |
+
title = {Dnotitia DNA 1.0 8B Instruct},
|
| 145 |
+
author = {Jungyup Lee, Jemin Kim, Sang Park, Seungjae Lee},
|
| 146 |
+
year = {2024},
|
| 147 |
+
url = {https://huggingface.co/dnotitia/DNA-1.0-8B-Instruct},
|
| 148 |
+
version = {1.0},
|
| 149 |
+
}
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
|