Update README.md
Browse files
README.md
CHANGED
|
@@ -20,23 +20,9 @@ This model is a merged version of [mistralai/Devstral-Small-2507](https://huggin
|
|
| 20 |
## Model Details
|
| 21 |
|
| 22 |
- **Base Model:** [mistralai/Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507)
|
| 23 |
-
- **LoRA Adapter:** [pankajmathur/Devstral-Small-2507-sft-v1-adapter](https://huggingface.co/pankajmathur/Devstral-Small-2507-sft-v1-adapter)
|
| 24 |
-
- **Training Dataset:** [pankajmathur/OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/pankajmathur/OpenThoughts-Agent-v1-SFT)
|
| 25 |
- **Parameters:** ~24B
|
| 26 |
- **Precision:** bfloat16
|
| 27 |
|
| 28 |
-
## Training Configuration
|
| 29 |
-
|
| 30 |
-
The LoRA adapter was trained with the following configuration:
|
| 31 |
-
- **LoRA Rank (r):** 32
|
| 32 |
-
- **LoRA Alpha:** 16
|
| 33 |
-
- **LoRA Dropout:** 0.05
|
| 34 |
-
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
| 35 |
-
- **Sequence Length:** 8192
|
| 36 |
-
- **Learning Rate:** 0.0001
|
| 37 |
-
- **Optimizer:** AdamW 8-bit
|
| 38 |
-
- **Epochs:** 3
|
| 39 |
-
|
| 40 |
## Usage
|
| 41 |
|
| 42 |
|
|
|
|
| 20 |
## Model Details
|
| 21 |
|
| 22 |
- **Base Model:** [mistralai/Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507)
|
|
|
|
|
|
|
| 23 |
- **Parameters:** ~24B
|
| 24 |
- **Precision:** bfloat16
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
## Usage
|
| 27 |
|
| 28 |
|