f7e65a49fd1b64424ee9370116158c3f
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the ccdv/patent-classification [abstract] dataset. It achieves the following results on the evaluation set:
- Loss: 9.8133
- Data Size: 1.0
- Epoch Runtime: 181.2153
- Accuracy: 0.6496
- F1 Macro: 0.6002
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Accuracy | F1 Macro |
|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 11.2749 | 0 | 10.8773 | 0.1332 | 0.0815 |
| No log | 1 | 781 | 8.1784 | 0.0078 | 12.2238 | 0.2564 | 0.1565 |
| No log | 2 | 1562 | 5.6207 | 0.0156 | 14.3409 | 0.4868 | 0.3293 |
| No log | 3 | 2343 | 4.8647 | 0.0312 | 19.1654 | 0.5723 | 0.4255 |
| 0.1452 | 4 | 3124 | 4.6690 | 0.0625 | 24.6738 | 0.5897 | 0.4789 |
| 4.5955 | 5 | 3905 | 4.2943 | 0.125 | 36.3555 | 0.6120 | 0.5563 |
| 4.1436 | 6 | 4686 | 4.0531 | 0.25 | 58.2511 | 0.6430 | 0.5795 |
| 3.3867 | 7 | 5467 | 4.0679 | 0.5 | 99.3198 | 0.6573 | 0.6119 |
| 2.3892 | 8.0 | 6248 | 4.3190 | 1.0 | 183.3381 | 0.6605 | 0.6187 |
| 0.5282 | 9.0 | 7029 | 7.4207 | 1.0 | 180.1089 | 0.6693 | 0.6170 |
| 0.5747 | 10.0 | 7810 | 9.8133 | 1.0 | 181.2153 | 0.6496 | 0.6002 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- 5
Model tree for contemmcm/f7e65a49fd1b64424ee9370116158c3f
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B