metadata
license: cc-by-nc-4.0
base_model: Edentns/DataVortexS-10.7B-dpo-v1.11
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: nhn_dpo_v3_DataVortexS-10.7B-dpo-v1.11_DPO
results: []
ENERGY-DRINK-LOVE/DataVortexS_dpov3
Our Team
- Youjin Chung
- Jingyeom Kim
Model
Base Model
Hardware and Software
- Hardware: A100 * 8 for training our model
- Deepspeed library & Huggingface TRL Trainer
Dataset
- DPO_dataset
- 자체 제작 dpo dataset(AI-hub dataset 활용)
- OpenOrca DPO 등 영어 데이터셋 번역(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, 자체모델 활용)
Training Method
Benchmark
| Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 |
|---|---|---|---|---|---|
| 60.18 | 56.23 | 69.15 | 52.76 | 67.87 | 54.9 |
