File size: 4,126 Bytes
18d618e d9b464d 18d618e 1ab39f2 3253c03 efa9442 3253c03 1ab39f2 efa9442 1ab39f2 18d618e 6299912 1ab39f2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | ---
library_name: peft
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
- generated_from_trainer
model-index:
- name: resultados
results: []
language:
- es
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# resultados
Este es un modelo afinado sobre [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
sobre un dataset de preguntas-respuestas del acuerdo 009 con un batch_size=1 y 10 épocas
con un uso total de VRAM de la GPU de 24 Gz,
logrando una pérdida de:
- Loss: 0.2677
## Model description
Este modelo se está usando para un sistema de chatbot que responde a preguntas sobre el acuerdo 009,por ahora se está usando el modelo para probarlo al estilo [Arena chatbot](https://lmarena.ai/)
con el fin de medir el rendimiento como un chat directo y se está probando el uso de RAG para responder a preguntas
con dcocumentos actuales.
## Intended uses & limitations
More information needed
## Training and evaluation data
Este modelo se genera del entrenamiento del modelo "mistralai/Mistral-7B-Instruct-v0.3" sobre un dataset de preguntas respuestas sobre el acuerdo 009 de la Universidad
del valle con 1700 ejemplos. El dataset fue realizado por estudiantes del programa de Ingeniería de Sistemas con la
ayuda de la vicedecanatura académica de ingeniería.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 1.2627 | 0.2694 | 100 | 1.2111 |
| 1.0079 | 0.5387 | 200 | 1.0255 |
| 0.8729 | 0.8081 | 300 | 0.8972 |
| 0.7103 | 1.0754 | 400 | 0.8024 |
| 0.6555 | 1.3448 | 500 | 0.7070 |
| 0.5711 | 1.6141 | 600 | 0.6281 |
| 0.6438 | 1.8835 | 700 | 0.5783 |
| 0.5111 | 2.1508 | 800 | 0.5160 |
| 0.4312 | 2.4202 | 900 | 0.4764 |
| 0.4467 | 2.6896 | 1000 | 0.4446 |
| 0.4222 | 2.9589 | 1100 | 0.4124 |
| 0.3802 | 3.2263 | 1200 | 0.3931 |
| 0.2767 | 3.4956 | 1300 | 0.3718 |
| 0.3598 | 3.7650 | 1400 | 0.3577 |
| 0.2838 | 4.0323 | 1500 | 0.3447 |
| 0.3169 | 4.3017 | 1600 | 0.3349 |
| 0.2737 | 4.5710 | 1700 | 0.3273 |
| 0.2425 | 4.8404 | 1800 | 0.3138 |
| 0.1814 | 5.1077 | 1900 | 0.3092 |
| 0.2372 | 5.3771 | 2000 | 0.3004 |
| 0.258 | 5.6465 | 2100 | 0.2953 |
| 0.2488 | 5.9158 | 2200 | 0.2911 |
| 0.2052 | 6.1832 | 2300 | 0.2926 |
| 0.1973 | 6.4525 | 2400 | 0.2929 |
| 0.2595 | 6.7219 | 2500 | 0.2828 |
| 0.1843 | 6.9912 | 2600 | 0.2771 |
| 0.1912 | 7.2586 | 2700 | 0.2784 |
| 0.2303 | 7.5279 | 2800 | 0.2777 |
| 0.2396 | 7.7973 | 2900 | 0.2697 |
| 0.2031 | 8.0646 | 3000 | 0.2708 |
| 0.1567 | 8.3340 | 3100 | 0.2730 |
| 0.1605 | 8.6034 | 3200 | 0.2690 |
| 0.1741 | 8.8727 | 3300 | 0.2674 |
| 0.1727 | 9.1401 | 3400 | 0.2709 |
| 0.1779 | 9.4094 | 3500 | 0.2666 |
| 0.1469 | 9.6788 | 3600 | 0.2687 |
| 0.1967 | 9.9481 | 3700 | 0.2677 |
### Framework versions
- PEFT 0.15.1
- Transformers 4.51.0
- Pytorch 2.6.0+cu126
- Datasets 3.5.0
- Tokenizers 0.21.1 |