RobertoMDLP
/

vit-tom-jerry-model

@@ -1,97 +1,85 @@
 ---
-library_name: transformers
-license: apache-2.0
-base_model: google/vit-base-patch16-224-in21k
-tags:
-- generated_from_trainer
 datasets:
-- imagefolder
 metrics:
 - accuracy
 - f1
 - precision
 - recall
-model-index:
-- name: vit-tom-jerry-model
-  results:
-  - task:
-      name: Image Classification
-      type: image-classification
-    dataset:
-      name: imagefolder
-      type: imagefolder
-      config: default
-      split: validation
-      args: default
-    metrics:
-    - name: Accuracy
-      type: accuracy
-      value: 0.991578947368421
-    - name: F1
-      type: f1
-      value: 0.9911287912744658
-    - name: Precision
-      type: precision
-      value: 0.9911287912744658
-    - name: Recall
-      type: recall
-      value: 0.9911287912744658
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# vit-tom-jerry-model
-This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0403
-- Accuracy: 0.9916
-- F1: 0.9911
-- Precision: 0.9911
-- Recall: 0.9911
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 5
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Accuracy | F1     | Precision | Recall |
-|:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
-| 0.0808        | 0.3597 | 100  | 0.1168          | 0.9705   | 0.9694 | 0.9646    | 0.9759 |
-| 0.212         | 0.7194 | 200  | 0.1209          | 0.9705   | 0.9691 | 0.9667    | 0.9719 |
-| 0.0008        | 1.0791 | 300  | 0.0403          | 0.9916   | 0.9911 | 0.9911    | 0.9911 |
-| 0.0041        | 1.4388 | 400  | 0.0464          | 0.9895   | 0.9889 | 0.9884    | 0.9894 |
-| 0.0004        | 1.7986 | 500  | 0.1313          | 0.9684   | 0.9671 | 0.9627    | 0.9732 |
-| 0.0005        | 2.1583 | 600  | 0.0855          | 0.9811   | 0.9802 | 0.9767    | 0.9845 |
-### Framework versions
-- Transformers 4.55.0
-- Pytorch 2.6.0+cu124
-- Datasets 4.0.0
-- Tokenizers 0.21.4

 ---
+license: cc0-1.0
 datasets:
+- RobertoMDLP/tom_and_jerry
+language:
+- en
 metrics:
 - accuracy
 - f1
 - precision
 - recall
+base_model:
+- google/vit-base-patch16-224-in21k
 ---
+# Tom and Jerry Image Classification with ViT
+Este modelo es una variante ajustada de **google/vit-base-patch16-224-in21k** para clasificar imágenes que contienen:
+- Tom
+- Jerry
+## Metodología
+1. **Preparación del dataset**
+   Se utilizó el dataset [`RobertoMDLP/tom_and_jerry`](https://huggingface.co/datasets/RobertoMDLP/tom_and_jerry) con dos clases (*Tom*, *Jerry*).
+   El conjunto de datos se dividió en 70% para entrenamiento, 15% para validación y 15% para prueba.
+2. **Preprocesamiento**
+   Las imágenes fueron redimensionadas a 224×224 píxeles y normalizadas utilizando el `ViTImageProcessor` preentrenado de `google/vit-base-patch16-224-in21k`.
+   No se aplicaron técnicas de aumento de datos.
+3. **Entrenamiento**
+   Se empleó el modelo base **ViT** con fine-tuning completo.
+   La configuración incluyó:
+   - Tamaño de lote: 8 (entrenamiento y evaluación)
+   - Tasa de aprendizaje: 2e-4
+   - Épocas: 3
+   - Estrategia de evaluación: cada 100 pasos
+   - Precisión mixta (FP16)
+   - Early stopping con paciencia de 3 evaluaciones
+   - Selección del mejor modelo según *accuracy* de validación
+4. **Evaluación**
+   El rendimiento se midió con Accuracy, F1, Precision y Recall.
+   Se seleccionó el checkpoint con mejor Accuracy en validación.
+## Resultados
+### Resumen de métricas (mejor checkpoint)
+| Métrica     | Valor   |
+|-------------|---------|
+| Accuracy    | 0.9916  |
+| F1          | 0.9911  |
+| Precision   | 0.9911  |
+| Recall      | 0.9911  |
+| Loss (eval) | 0.0403  |
+### Evolución por pasos
+| Step | Train Loss | Val Loss | Accuracy  | F1       | Precision | Recall   |
+|------|-----------:|---------:|----------:|---------:|----------:|---------:|
+| 100  | 0.0808     | 0.1168   | 0.9705    | 0.9694   | 0.9646    | 0.9759   |
+| 200  | 0.2120     | 0.1209   | 0.9705    | 0.9691   | 0.9667    | 0.9719   |
+| 300  | 0.0008     | 0.0403   | 0.9916    | 0.9911   | 0.9911    | 0.9911   |
+| 400  | 0.0041     | 0.0464   | 0.9895    | 0.9889   | 0.9884    | 0.9894   |
+| 500  | 0.0004     | 0.1313   | 0.9684    | 0.9671   | 0.9627    | 0.9732   |
+| 600  | 0.0005     | 0.0855   | 0.9811    | 0.9802   | 0.9767    | 0.9845   |
+### Métricas finales
+**Entrenamiento**
+- Epoch: 2.1583
+- Loss: 0.0394
+- Tiempo: 6 min 3 s
+- Velocidad: 30.58 muestras/s
+**Evaluación**
+- Accuracy: 0.9916
+- F1: 0.9911
+- Precision: 0.9911
+- Recall: 0.9911
+- Loss: 0.0403
+- Tiempo: 6.33 s
+- Velocidad: 74.97 muestras/s