Update README.md
Browse files
README.md
CHANGED
|
@@ -16,6 +16,9 @@ language:
|
|
| 16 |
## Model Overview
|
| 17 |
This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
## Key Features
|
| 20 |
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
|
| 21 |
- **Quantization & Optimization**:
|
|
|
|
| 16 |
## Model Overview
|
| 17 |
This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
|
| 18 |
|
| 19 |
+
# Training script
|
| 20 |
+
Script here: https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
|
| 21 |
+
|
| 22 |
## Key Features
|
| 23 |
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
|
| 24 |
- **Quantization & Optimization**:
|