Text Generation
Transformers
Safetensors
English
qwen2
axolotl
dpo
trl
conversational
Eval Results
text-generation-inference
Weyaxi commited on
Commit
b93a89b
·
verified ·
1 Parent(s): 74707cf

add model card

Browse files
Files changed (1) hide show
  1. README.md +84 -46
README.md CHANGED
@@ -102,11 +102,33 @@ model-index:
102
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
103
  name: Open LLM Leaderboard
104
  ---
 
 
 
 
105
 
106
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
107
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
110
  <details><summary>See axolotl config</summary>
111
 
112
  axolotl version: `0.4.1`
@@ -186,67 +208,83 @@ fsdp:
186
  fsdp_config:
187
 
188
  save_safetensors: true
189
-
190
  ```
191
 
192
  </details><br>
193
 
194
- # Humanish-Qwen2.5-7B-Instruct
195
 
196
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on an unknown dataset.
197
 
198
- ## Model description
199
 
200
- More information needed
 
 
 
 
 
 
 
 
 
 
201
 
202
- ## Intended uses & limitations
 
 
 
 
 
 
 
203
 
204
- More information needed
205
 
206
- ## Training and evaluation data
 
 
 
 
207
 
208
- More information needed
209
 
210
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
211
 
212
- ### Training hyperparameters
213
 
214
- The following hyperparameters were used during training:
215
- - learning_rate: 0.0002
216
- - train_batch_size: 2
217
- - eval_batch_size: 8
218
- - seed: 42
219
- - distributed_type: multi-GPU
220
- - num_devices: 2
221
- - gradient_accumulation_steps: 8
222
- - total_train_batch_size: 32
223
- - total_eval_batch_size: 16
224
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
225
- - lr_scheduler_type: cosine
226
- - lr_scheduler_warmup_steps: 10
227
- - training_steps: 341
228
 
229
- ### Training results
230
 
 
 
231
 
 
232
 
233
- ### Framework versions
234
 
235
- - PEFT 0.13.0
236
- - Transformers 4.45.1
237
- - Pytorch 2.3.1+cu121
238
- - Datasets 2.21.0
239
- - Tokenizers 0.20.0
240
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
241
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HumanLLMs__Humanish-Qwen2.5-7B-Instruct)
242
 
243
- | Metric |Value|
244
- |-------------------|----:|
245
- |Avg. |26.67|
246
- |IFEval (0-Shot) |72.84|
247
- |BBH (3-Shot) |34.48|
248
- |MATH Lvl 5 (4-Shot)| 0.00|
249
- |GPQA (0-shot) | 6.49|
250
- |MuSR (0-shot) | 8.42|
251
- |MMLU-PRO (5-shot) |37.76|
252
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
103
  name: Open LLM Leaderboard
104
  ---
105
+ <div align="center">
106
+ <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63da3d7ae697e5898cb86854/H-vpXOX6KZu01HnV87Jk5.jpeg" width="320" height="320" />
107
+ <h1>Enhancing Human-Like Responses in Large Language Models</h1>
108
+ </div>
109
 
110
+ <p align="center">
111
+ &nbsp&nbsp | 🤗 <a href="https://huggingface.co/collections/HumanLLMs/human-like-humanish-llms-6759fa68f22e11eb1a10967e">Models</a>&nbsp&nbsp |
112
+ &nbsp&nbsp 📊 <a href="https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset">Dataset</a>&nbsp&nbsp |
113
+ &nbsp&nbsp 📄<a href="https://arxiv.org/abs/2501.05032">Paper</a>&nbsp&nbsp |
114
+ </p>
115
+
116
+ # 🚀 Human-Like-Qwen2.5-7B-Instruct
117
+
118
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), specifically optimized to generate more human-like and conversational responses.
119
+
120
+ The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2501.05032) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions.
121
+
122
+ The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032).
123
+
124
+ # 🛠️ Training Configuration
125
+
126
+ - **Base Model:** Qwen2.5-7B-Instruct
127
+ - **Framework:** Axolotl v0.4.1
128
+ - **Hardware:** 2x NVIDIA A100 (80 GB) GPUs
129
+ - **Training Time:** ~2 hours 15 minutes
130
+ - **Dataset:** Synthetic dataset with ≈11,000 samples across 256 diverse topics
131
 
 
132
  <details><summary>See axolotl config</summary>
133
 
134
  axolotl version: `0.4.1`
 
208
  fsdp_config:
209
 
210
  save_safetensors: true
 
211
  ```
212
 
213
  </details><br>
214
 
215
+ # 💬 Prompt Template
216
 
217
+ You can use ChatML prompt template while using the model:
218
 
219
+ ### ChatML
220
 
221
+ ```
222
+ <|im_start|>system
223
+ {system}<|im_end|>
224
+ <|im_start|>user
225
+ {user}<|im_end|>
226
+ <|im_start|>assistant
227
+ {asistant}<|im_end|>
228
+ ```
229
+
230
+ This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the
231
+ `tokenizer.apply_chat_template()` method:
232
 
233
+ ```python
234
+ messages = [
235
+ {"role": "system", "content": "You are helpful AI asistant."},
236
+ {"role": "user", "content": "Hello!"}
237
+ ]
238
+ gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
239
+ model.generate(**gen_input)
240
+ ```
241
 
242
+ # 🤖 Models
243
 
244
+ | Model | Download |
245
+ |:---------------------:|:-----------------------------------------------------------------------:|
246
+ | Human-Like-Llama-3-8B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-LLama3-8B-Instruct) |
247
+ | Human-Like-Qwen-2.5-7B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) |
248
+ | Human-Like-Mistral-Nemo-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) |
249
 
250
+ # 🎯 Benchmark Results
251
 
252
+ | **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
253
+ |--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
254
+ | **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 |
255
+ | | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 |
256
+ | | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 |
257
+ | **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 |
258
+ | | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 |
259
+ | | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** |
260
+ | **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 |
261
+ | | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 |
262
+ | | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 |
263
 
 
264
 
265
+ # 📊 Dataset
 
 
 
 
 
 
 
 
 
 
 
 
 
266
 
267
+ The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of:
268
 
269
+ - **Human-like responses:** Natural, conversational answers mimicking human dialogue.
270
+ - **Formal responses:** Structured and precise answers with a more formal tone.
271
 
272
+ The dataset has been open-sourced and is available at:
273
 
274
+ - 👉 [Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
275
 
276
+ More details on the dataset creation process can be found in the accompanying research paper.
 
 
 
 
 
 
277
 
278
+ # 📝 Citation
 
 
 
 
 
 
 
 
279
 
280
+ ```
281
+ @misc{çalık2025enhancinghumanlikeresponseslarge,
282
+ title={Enhancing Human-Like Responses in Large Language Models},
283
+ author={Ethem Yağız Çalık and Talha Rüzgar Akkuş},
284
+ year={2025},
285
+ eprint={2501.05032},
286
+ archivePrefix={arXiv},
287
+ primaryClass={cs.CL},
288
+ url={https://arxiv.org/abs/2501.05032},
289
+ }
290
+ ```