RobertoMDLP commited on
Commit
45cfbb7
verified
1 Parent(s): 06952e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -88
README.md CHANGED
@@ -1,97 +1,85 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model: google/vit-base-patch16-224-in21k
5
- tags:
6
- - generated_from_trainer
7
  datasets:
8
- - imagefolder
 
 
9
  metrics:
10
  - accuracy
11
  - f1
12
  - precision
13
  - recall
14
- model-index:
15
- - name: vit-tom-jerry-model
16
- results:
17
- - task:
18
- name: Image Classification
19
- type: image-classification
20
- dataset:
21
- name: imagefolder
22
- type: imagefolder
23
- config: default
24
- split: validation
25
- args: default
26
- metrics:
27
- - name: Accuracy
28
- type: accuracy
29
- value: 0.991578947368421
30
- - name: F1
31
- type: f1
32
- value: 0.9911287912744658
33
- - name: Precision
34
- type: precision
35
- value: 0.9911287912744658
36
- - name: Recall
37
- type: recall
38
- value: 0.9911287912744658
39
  ---
40
 
41
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
42
- should probably proofread and complete it, then remove this comment. -->
43
-
44
- # vit-tom-jerry-model
45
-
46
- This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
47
- It achieves the following results on the evaluation set:
48
- - Loss: 0.0403
49
- - Accuracy: 0.9916
50
- - F1: 0.9911
51
- - Precision: 0.9911
52
- - Recall: 0.9911
53
-
54
- ## Model description
55
-
56
- More information needed
57
-
58
- ## Intended uses & limitations
59
-
60
- More information needed
61
-
62
- ## Training and evaluation data
63
-
64
- More information needed
65
-
66
- ## Training procedure
67
-
68
- ### Training hyperparameters
69
-
70
- The following hyperparameters were used during training:
71
- - learning_rate: 0.0002
72
- - train_batch_size: 8
73
- - eval_batch_size: 8
74
- - seed: 42
75
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
76
- - lr_scheduler_type: linear
77
- - num_epochs: 5
78
- - mixed_precision_training: Native AMP
79
-
80
- ### Training results
81
-
82
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
83
- |:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
84
- | 0.0808 | 0.3597 | 100 | 0.1168 | 0.9705 | 0.9694 | 0.9646 | 0.9759 |
85
- | 0.212 | 0.7194 | 200 | 0.1209 | 0.9705 | 0.9691 | 0.9667 | 0.9719 |
86
- | 0.0008 | 1.0791 | 300 | 0.0403 | 0.9916 | 0.9911 | 0.9911 | 0.9911 |
87
- | 0.0041 | 1.4388 | 400 | 0.0464 | 0.9895 | 0.9889 | 0.9884 | 0.9894 |
88
- | 0.0004 | 1.7986 | 500 | 0.1313 | 0.9684 | 0.9671 | 0.9627 | 0.9732 |
89
- | 0.0005 | 2.1583 | 600 | 0.0855 | 0.9811 | 0.9802 | 0.9767 | 0.9845 |
90
-
91
-
92
- ### Framework versions
93
-
94
- - Transformers 4.55.0
95
- - Pytorch 2.6.0+cu124
96
- - Datasets 4.0.0
97
- - Tokenizers 0.21.4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc0-1.0
 
 
 
 
3
  datasets:
4
+ - RobertoMDLP/tom_and_jerry
5
+ language:
6
+ - en
7
  metrics:
8
  - accuracy
9
  - f1
10
  - precision
11
  - recall
12
+ base_model:
13
+ - google/vit-base-patch16-224-in21k
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
+ # Tom and Jerry Image Classification with ViT
17
+
18
+ Este modelo es una variante ajustada de **google/vit-base-patch16-224-in21k** para clasificar im谩genes que contienen:
19
+ - Tom
20
+ - Jerry
21
+
22
+ ## Metodolog铆a
23
+
24
+ 1. **Preparaci贸n del dataset**
25
+ Se utiliz贸 el dataset [`RobertoMDLP/tom_and_jerry`](https://huggingface.co/datasets/RobertoMDLP/tom_and_jerry) con dos clases (*Tom*, *Jerry*).
26
+ El conjunto de datos se dividi贸 en 70% para entrenamiento, 15% para validaci贸n y 15% para prueba.
27
+
28
+ 2. **Preprocesamiento**
29
+ Las im谩genes fueron redimensionadas a 224脳224 p铆xeles y normalizadas utilizando el `ViTImageProcessor` preentrenado de `google/vit-base-patch16-224-in21k`.
30
+ No se aplicaron t茅cnicas de aumento de datos.
31
+
32
+ 3. **Entrenamiento**
33
+ Se emple贸 el modelo base **ViT** con fine-tuning completo.
34
+ La configuraci贸n incluy贸:
35
+ - Tama帽o de lote: 8 (entrenamiento y evaluaci贸n)
36
+ - Tasa de aprendizaje: 2e-4
37
+ - 脡pocas: 3
38
+ - Estrategia de evaluaci贸n: cada 100 pasos
39
+ - Precisi贸n mixta (FP16)
40
+ - Early stopping con paciencia de 3 evaluaciones
41
+ - Selecci贸n del mejor modelo seg煤n *accuracy* de validaci贸n
42
+
43
+ 4. **Evaluaci贸n**
44
+ El rendimiento se midi贸 con Accuracy, F1, Precision y Recall.
45
+ Se seleccion贸 el checkpoint con mejor Accuracy en validaci贸n.
46
+
47
+ ## Resultados
48
+
49
+ ### Resumen de m茅tricas (mejor checkpoint)
50
+
51
+ | M茅trica | Valor |
52
+ |-------------|---------|
53
+ | Accuracy | 0.9916 |
54
+ | F1 | 0.9911 |
55
+ | Precision | 0.9911 |
56
+ | Recall | 0.9911 |
57
+ | Loss (eval) | 0.0403 |
58
+
59
+ ### Evoluci贸n por pasos
60
+
61
+ | Step | Train Loss | Val Loss | Accuracy | F1 | Precision | Recall |
62
+ |------|-----------:|---------:|----------:|---------:|----------:|---------:|
63
+ | 100 | 0.0808 | 0.1168 | 0.9705 | 0.9694 | 0.9646 | 0.9759 |
64
+ | 200 | 0.2120 | 0.1209 | 0.9705 | 0.9691 | 0.9667 | 0.9719 |
65
+ | 300 | 0.0008 | 0.0403 | 0.9916 | 0.9911 | 0.9911 | 0.9911 |
66
+ | 400 | 0.0041 | 0.0464 | 0.9895 | 0.9889 | 0.9884 | 0.9894 |
67
+ | 500 | 0.0004 | 0.1313 | 0.9684 | 0.9671 | 0.9627 | 0.9732 |
68
+ | 600 | 0.0005 | 0.0855 | 0.9811 | 0.9802 | 0.9767 | 0.9845 |
69
+
70
+ ### M茅tricas finales
71
+
72
+ **Entrenamiento**
73
+ - Epoch: 2.1583
74
+ - Loss: 0.0394
75
+ - Tiempo: 6 min 3 s
76
+ - Velocidad: 30.58 muestras/s
77
+
78
+ **Evaluaci贸n**
79
+ - Accuracy: 0.9916
80
+ - F1: 0.9911
81
+ - Precision: 0.9911
82
+ - Recall: 0.9911
83
+ - Loss: 0.0403
84
+ - Tiempo: 6.33 s
85
+ - Velocidad: 74.97 muestras/s