Carlos Rosas
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,27 +10,35 @@ The model was fine-tuned on a specialized corpus consisting of:
|
|
| 10 |
2. Retrieved documents: For each synthetic query, relevant documents were retrieved using the BM25 ranking algorithm.
|
| 11 |
3. Generated answers: Responses to the synthetic queries were created based on the retrieved documents.
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
## Usage
|
| 36 |
|
|
|
|
| 10 |
2. Retrieved documents: For each synthetic query, relevant documents were retrieved using the BM25 ranking algorithm.
|
| 11 |
3. Generated answers: Responses to the synthetic queries were created based on the retrieved documents.
|
| 12 |
|
| 13 |
+
```yaml
|
| 14 |
+
Training Hyperparameters:
|
| 15 |
+
Max Steps: 3000
|
| 16 |
+
Learning Rate: 3e-4
|
| 17 |
+
Batch Size: 2 per device
|
| 18 |
+
Gradient Accumulation Steps: 4
|
| 19 |
+
Max Sequence Length: 8192
|
| 20 |
+
Weight Decay: 0.001
|
| 21 |
+
Warmup Ratio: 0.03
|
| 22 |
+
LR Scheduler: Linear
|
| 23 |
+
Optimizer: paged_adamw_32bit
|
| 24 |
+
|
| 25 |
+
LoRA Configuration:
|
| 26 |
+
LoRA Alpha: 16
|
| 27 |
+
LoRA Dropout: 0.1
|
| 28 |
+
LoRA R: 64
|
| 29 |
+
Target Modules:
|
| 30 |
+
- gate_proj
|
| 31 |
+
- down_proj
|
| 32 |
+
- up_proj
|
| 33 |
+
- q_proj
|
| 34 |
+
- v_proj
|
| 35 |
+
- k_proj
|
| 36 |
+
- o_proj
|
| 37 |
+
|
| 38 |
+
Quantization:
|
| 39 |
+
Quantization: 4-bit
|
| 40 |
+
Quantization Type: nf4
|
| 41 |
+
Compute Dtype: float16
|
| 42 |
|
| 43 |
## Usage
|
| 44 |
|