thomashk2001 commited on
Commit
d85e27f
·
verified ·
1 Parent(s): 7698c1c

Fine-tuned ViT on Tom & Jerry dataset

Browse files
Files changed (1) hide show
  1. README.md +73 -70
README.md CHANGED
@@ -1,75 +1,78 @@
1
-
2
  ---
3
- language:
4
- - "en"
5
- pretty_name: "Tom and Jerry Image Classification VIT Model"
6
  tags:
7
- - "vision"
8
- - "image-classification"
9
- license: "cc0-1.0"
10
- task_categories:
11
- - "image-classification"
 
 
 
 
12
  ---
13
 
14
- # Modelo VIT afinado para clasificación de imágenes de Tom y Jerry
15
- ## Modelo base: 'google/vit-base-patch16-224-in21k'
16
- EL modelo VIT fue ajusto para la clasificación de imágenes de Tom y Jerry en las siguientes categorías:
17
- - Tom: Tom está en la imagen
18
- - Jerry: Jerry está en la imagen
19
- - Tom_and_Jerry: Tom y Jerry están en la imagen
20
- - None: Ninguno está en la imagen
21
-
22
- ## Metodología
23
- - Se realizó el afinamiento del modelo con el dataset thomashk2001/tom_and_jerry_dataset. El cual se encuentra dividido en train, eval y testing.
24
- - Los splits están estratificados por lo que hay de cada uno de los posibles labels en los splits.
25
- - Se realizó el procesamiento de las imágenes con el ViTImageProcessor con el modelo 'google/vit-base-patch16-224-in21k'.
26
- - Los argumentos de entrenamiento fueron:
27
- ```
28
- training_args = TrainingArguments(
29
- output_dir="./vit_tom_jerry_mdl", # Checkpoints and saved model
30
- per_device_train_batch_size=64,# Train batch size
31
- per_device_eval_batch_size=64,# Eval batch size
32
- num_train_epochs=5,# Number of epochs
33
- learning_rate=2e-4,# LR rate
34
- eval_strategy="steps",# Eval at the end of each step
35
- eval_steps=25, # How often model is evaluated
36
- save_strategy="steps", # Saves model every 100 steps
37
- save_steps=100,
38
- save_total_limit=5, # Model states saved including best model
39
- load_best_model_at_end=True, # Loads best model at the end
40
- logging_dir="./logs", # Lod dir
41
- logging_steps=10, # Log register step
42
- remove_unused_columns=False,
43
- metric_for_best_model="f1", # Metric used for the best model
44
- greater_is_better=True, # better f1 is looked after
45
- )
46
- ```
47
- - Se aplicó el afinamiento del modelo con los parámetros definidos en el paso anterior y se uso early stopping con paciencia de 3.
48
-
49
-
50
-
51
- ## Resultados del entrenamiento:
52
- | Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
53
- |------|---------------|----------------|-----------|-----------|--------|---------|
54
- | 25 | 0.8223 | 0.4506 | 0.8893 | 0.8939 | 0.8653 | 0.8742 |
55
- | 50 | 0.2676 | 0.2195 | 0.9392 | 0.9343 | 0.9376 | 0.9356 |
56
- | 75 | 0.1896 | 0.1816 | 0.9526 | 0.9490 | 0.9504 | 0.9493 |
57
- | 100 | 0.1085 | 0.1940 | 0.9380 | 0.9316 | 0.9381 | 0.9344 |
58
- | 125 | 0.1618 | 0.1806 | 0.9477 | 0.9390 | 0.9493 | 0.9434 |
59
- | 150 | 0.0784 | 0.1582 | 0.9574 | 0.9524 | 0.9570 | 0.9546 |
60
- | 175 | 0.0710 | 0.1803 | 0.9416 | 0.9364 | 0.9413 | 0.9386 |
61
- | 200 | 0.0533 | 0.1539 | 0.9611 | 0.9623 | 0.9600 | 0.9605 |
62
- | 225 | 0.0383 | 0.1446 | 0.9647 | 0.9654 | 0.9642 | 0.9646 |
63
- | 250 | 0.0264 | 0.1619 | 0.9513 | 0.9447 | 0.9546 | 0.9488 |
64
- | 275 | 0.0227 | 0.1524 | 0.9550 | 0.9498 | 0.9579 | 0.9531 |
65
- | 300 | 0.0343 | 0.1530 | 0.9562 | 0.9526 | 0.9587 | 0.9553 |
66
-
67
- ## Mejor Modelo
68
- - Step: 225
69
- - Training Loss: 0.0383
70
- - Validation Loss: 0.1446
71
- - Accuracy: 0.9647
72
- - Precision: 0.9654
73
- - Recall: 0.9642
74
- - F1 Score: 0.9646
75
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: google/vit-base-patch16-224-in21k
5
  tags:
6
+ - generated_from_trainer
7
+ metrics:
8
+ - accuracy
9
+ - precision
10
+ - recall
11
+ - f1
12
+ model-index:
13
+ - name: tom_and_jerry_vit_model
14
+ results: []
15
  ---
16
 
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
+
20
+ # tom_and_jerry_vit_model
21
+
22
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an unknown dataset.
23
+ It achieves the following results on the evaluation set:
24
+ - Loss: 0.1530
25
+ - Accuracy: 0.9562
26
+ - Precision: 0.9526
27
+ - Recall: 0.9587
28
+ - F1: 0.9553
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 0.0002
48
+ - train_batch_size: 64
49
+ - eval_batch_size: 64
50
+ - seed: 42
51
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
52
+ - lr_scheduler_type: linear
53
+ - num_epochs: 5
54
+
55
+ ### Training results
56
+
57
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
58
+ |:-------------:|:------:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
59
+ | 0.8223 | 0.4167 | 25 | 0.4506 | 0.8893 | 0.8939 | 0.8653 | 0.8742 |
60
+ | 0.2676 | 0.8333 | 50 | 0.2195 | 0.9392 | 0.9343 | 0.9376 | 0.9356 |
61
+ | 0.1896 | 1.25 | 75 | 0.1816 | 0.9526 | 0.9490 | 0.9504 | 0.9493 |
62
+ | 0.1085 | 1.6667 | 100 | 0.1940 | 0.9380 | 0.9316 | 0.9381 | 0.9344 |
63
+ | 0.1618 | 2.0833 | 125 | 0.1806 | 0.9477 | 0.9390 | 0.9493 | 0.9434 |
64
+ | 0.0784 | 2.5 | 150 | 0.1582 | 0.9574 | 0.9524 | 0.9570 | 0.9546 |
65
+ | 0.071 | 2.9167 | 175 | 0.1803 | 0.9416 | 0.9364 | 0.9413 | 0.9386 |
66
+ | 0.0533 | 3.3333 | 200 | 0.1539 | 0.9611 | 0.9623 | 0.9600 | 0.9605 |
67
+ | 0.0383 | 3.75 | 225 | 0.1446 | 0.9647 | 0.9654 | 0.9642 | 0.9646 |
68
+ | 0.0264 | 4.1667 | 250 | 0.1619 | 0.9513 | 0.9447 | 0.9546 | 0.9488 |
69
+ | 0.0227 | 4.5833 | 275 | 0.1524 | 0.9550 | 0.9498 | 0.9579 | 0.9531 |
70
+ | 0.0343 | 5.0 | 300 | 0.1530 | 0.9562 | 0.9526 | 0.9587 | 0.9553 |
71
+
72
+
73
+ ### Framework versions
 
 
 
 
74
 
75
+ - Transformers 4.55.2
76
+ - Pytorch 2.8.0+cu129
77
+ - Datasets 4.0.0
78
+ - Tokenizers 0.21.4