thomashk2001
/

tom_and_jerry_vit_model

@@ -1,78 +1,75 @@
 ---
-library_name: transformers
-license: apache-2.0
-base_model: google/vit-base-patch16-224-in21k
 tags:
-- generated_from_trainer
-metrics:
-- accuracy
-- precision
-- recall
-- f1
-model-index:
-- name: tom_and_jerry_vit_model
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# tom_and_jerry_vit_model
-This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.1530
-- Accuracy: 0.9562
-- Precision: 0.9526
-- Recall: 0.9587
-- F1: 0.9553
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 64
-- eval_batch_size: 64
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 5
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Accuracy | Precision | Recall | F1     |
-|:-------------:|:------:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
-| 0.8223        | 0.4167 | 25   | 0.4506          | 0.8893   | 0.8939    | 0.8653 | 0.8742 |
-| 0.2676        | 0.8333 | 50   | 0.2195          | 0.9392   | 0.9343    | 0.9376 | 0.9356 |
-| 0.1896        | 1.25   | 75   | 0.1816          | 0.9526   | 0.9490    | 0.9504 | 0.9493 |
-| 0.1085        | 1.6667 | 100  | 0.1940          | 0.9380   | 0.9316    | 0.9381 | 0.9344 |
-| 0.1618        | 2.0833 | 125  | 0.1806          | 0.9477   | 0.9390    | 0.9493 | 0.9434 |
-| 0.0784        | 2.5    | 150  | 0.1582          | 0.9574   | 0.9524    | 0.9570 | 0.9546 |
-| 0.071         | 2.9167 | 175  | 0.1803          | 0.9416   | 0.9364    | 0.9413 | 0.9386 |
-| 0.0533        | 3.3333 | 200  | 0.1539          | 0.9611   | 0.9623    | 0.9600 | 0.9605 |
-| 0.0383        | 3.75   | 225  | 0.1446          | 0.9647   | 0.9654    | 0.9642 | 0.9646 |
-| 0.0264        | 4.1667 | 250  | 0.1619          | 0.9513   | 0.9447    | 0.9546 | 0.9488 |
-| 0.0227        | 4.5833 | 275  | 0.1524          | 0.9550   | 0.9498    | 0.9579 | 0.9531 |
-| 0.0343        | 5.0    | 300  | 0.1530          | 0.9562   | 0.9526    | 0.9587 | 0.9553 |
-### Framework versions
-- Transformers 4.55.2
-- Pytorch 2.8.0+cu129
-- Datasets 4.0.0
-- Tokenizers 0.21.4

 ---
+language:
+- "es"
+pretty_name: "Tom and Jerry Image Classification VIT Model"
 tags:
+- "vision"
+- "image-classification"
+license: "cc0-1.0"
+task_categories:
+- "image-classification"
 ---
+# Modelo VIT afinado para clasificación de imágenes de Tom y Jerry
+## Modelo base: 'google/vit-base-patch16-224-in21k'
+EL modelo VIT fue ajusto para la clasificación de imágenes de Tom y Jerry en las siguientes categorías:
+- Tom: Tom está en la imagen
+- Jerry: Jerry está en la imagen
+- Tom_and_Jerry: Tom y Jerry están en la imagen
+- None: Ninguno está en la imagen
+## Metodología
+- Se realizó el afinamiento del modelo con el dataset thomashk2001/tom_and_jerry_dataset. El cual se encuentra dividido en train, eval y testing.
+- Los splits están estratificados por lo que hay de cada uno de los posibles labels en los splits.
+- Se realizó el procesamiento de las imágenes con el ViTImageProcessor con el modelo 'google/vit-base-patch16-224-in21k'.
+- Los argumentos de entrenamiento fueron:
+```
+training_args = TrainingArguments(
+    output_dir="./vit_tom_jerry_mdl",   # Checkpoints and saved model
+    per_device_train_batch_size=64,# Train batch size
+    per_device_eval_batch_size=64,# Eval batch size
+    num_train_epochs=5,# Number of epochs
+    learning_rate=2e-4,# LR rate
+    eval_strategy="steps",# Eval at the end of each step
+    eval_steps=25, # How often model is evaluated
+    save_strategy="steps",  # Saves model every 100 steps
+    save_steps=100,
+    save_total_limit=5,  # Model states saved including best model
+    load_best_model_at_end=True, # Loads best model at the end
+    logging_dir="./logs", # Lod dir
+    logging_steps=10, # Log register step
+    remove_unused_columns=False,
+    metric_for_best_model="f1", # Metric used for the best model
+    greater_is_better=True, # better f1 is looked after
+)
+```
+- Se aplicó el afinamiento del modelo con los parámetros definidos en el paso anterior y se uso early stopping con paciencia de 3.
+## Resultados del entrenamiento:
+| Step | Training Loss | Validation Loss | Accuracy  | Precision | Recall  | F1      |
+|------|---------------|----------------|-----------|-----------|--------|---------|
+| 25   | 0.8223        | 0.4506         | 0.8893    | 0.8939    | 0.8653 | 0.8742  |
+| 50   | 0.2676        | 0.2195         | 0.9392    | 0.9343    | 0.9376 | 0.9356  |
+| 75   | 0.1896        | 0.1816         | 0.9526    | 0.9490    | 0.9504 | 0.9493  |
+| 100  | 0.1085        | 0.1940         | 0.9380    | 0.9316    | 0.9381 | 0.9344  |
+| 125  | 0.1618        | 0.1806         | 0.9477    | 0.9390    | 0.9493 | 0.9434  |
+| 150  | 0.0784        | 0.1582         | 0.9574    | 0.9524    | 0.9570 | 0.9546  |
+| 175  | 0.0710        | 0.1803         | 0.9416    | 0.9364    | 0.9413 | 0.9386  |
+| 200  | 0.0533        | 0.1539         | 0.9611    | 0.9623    | 0.9600 | 0.9605  |
+| 225  | 0.0383        | 0.1446         | 0.9647    | 0.9654    | 0.9642 | 0.9646  |
+| 250  | 0.0264        | 0.1619         | 0.9513    | 0.9447    | 0.9546 | 0.9488  |
+| 275  | 0.0227        | 0.1524         | 0.9550    | 0.9498    | 0.9579 | 0.9531  |
+| 300  | 0.0343        | 0.1530         | 0.9562    | 0.9526    | 0.9587 | 0.9553  |
+## Mejor Modelo
+- Step: 225
+- Training Loss: 0.0383
+- Validation Loss: 0.1446
+- Accuracy: 0.9647
+- Precision: 0.9654
+- Recall: 0.9642
+- F1 Score: 0.9646