|
|
--- |
|
|
license: cc0-1.0 |
|
|
datasets: |
|
|
- RobertoMDLP/tom_and_jerry |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
base_model: |
|
|
- google/vit-base-patch16-224-in21k |
|
|
--- |
|
|
|
|
|
# Tom and Jerry Image Classification with ViT |
|
|
|
|
|
Este modelo es una variante ajustada de **google/vit-base-patch16-224-in21k** para clasificar im谩genes que contienen: |
|
|
- Tom |
|
|
- Jerry |
|
|
|
|
|
## Metodolog铆a |
|
|
|
|
|
1. **Preparaci贸n del dataset** |
|
|
Se utiliz贸 el dataset [`RobertoMDLP/tom_and_jerry`](https://huggingface.co/datasets/RobertoMDLP/tom_and_jerry) con dos clases (*Tom*, *Jerry*). |
|
|
El conjunto de datos se dividi贸 en 70% para entrenamiento, 15% para validaci贸n y 15% para prueba. |
|
|
|
|
|
2. **Preprocesamiento** |
|
|
Las im谩genes fueron redimensionadas a 224脳224 p铆xeles y normalizadas utilizando el `ViTImageProcessor` preentrenado de `google/vit-base-patch16-224-in21k`. |
|
|
No se aplicaron t茅cnicas de aumento de datos. |
|
|
|
|
|
3. **Entrenamiento** |
|
|
Se emple贸 el modelo base **ViT** con fine-tuning completo. |
|
|
La configuraci贸n incluy贸: |
|
|
- Tama帽o de lote: 8 (entrenamiento y evaluaci贸n) |
|
|
- Tasa de aprendizaje: 2e-4 |
|
|
- 脡pocas: 3 |
|
|
- Estrategia de evaluaci贸n: cada 100 pasos |
|
|
- Precisi贸n mixta (FP16) |
|
|
- Early stopping con paciencia de 3 evaluaciones |
|
|
- Selecci贸n del mejor modelo seg煤n *accuracy* de validaci贸n |
|
|
|
|
|
4. **Evaluaci贸n** |
|
|
El rendimiento se midi贸 con Accuracy, F1, Precision y Recall. |
|
|
Se seleccion贸 el checkpoint con mejor Accuracy en validaci贸n. |
|
|
|
|
|
## Resultados |
|
|
|
|
|
### Resumen de m茅tricas (mejor checkpoint) |
|
|
|
|
|
| M茅trica | Valor | |
|
|
|-------------|---------| |
|
|
| Accuracy | 0.9916 | |
|
|
| F1 | 0.9911 | |
|
|
| Precision | 0.9911 | |
|
|
| Recall | 0.9911 | |
|
|
| Loss (eval) | 0.0403 | |
|
|
|
|
|
### Evoluci贸n por pasos |
|
|
|
|
|
| Step | Train Loss | Val Loss | Accuracy | F1 | Precision | Recall | |
|
|
|------|-----------:|---------:|----------:|---------:|----------:|---------:| |
|
|
| 100 | 0.0808 | 0.1168 | 0.9705 | 0.9694 | 0.9646 | 0.9759 | |
|
|
| 200 | 0.2120 | 0.1209 | 0.9705 | 0.9691 | 0.9667 | 0.9719 | |
|
|
| 300 | 0.0008 | 0.0403 | 0.9916 | 0.9911 | 0.9911 | 0.9911 | |
|
|
| 400 | 0.0041 | 0.0464 | 0.9895 | 0.9889 | 0.9884 | 0.9894 | |
|
|
| 500 | 0.0004 | 0.1313 | 0.9684 | 0.9671 | 0.9627 | 0.9732 | |
|
|
| 600 | 0.0005 | 0.0855 | 0.9811 | 0.9802 | 0.9767 | 0.9845 | |
|
|
|
|
|
### M茅tricas finales |
|
|
|
|
|
**Entrenamiento** |
|
|
- Epoch: 2.1583 |
|
|
- Loss: 0.0394 |
|
|
- Tiempo: 6 min 3 s |
|
|
- Velocidad: 30.58 muestras/s |
|
|
|
|
|
**Evaluaci贸n** |
|
|
- Accuracy: 0.9916 |
|
|
- F1: 0.9911 |
|
|
- Precision: 0.9911 |
|
|
- Recall: 0.9911 |
|
|
- Loss: 0.0403 |
|
|
- Tiempo: 6.33 s |
|
|
- Velocidad: 74.97 muestras/s |
|
|
|
|
|
### Framework versions |
|
|
- Transformers 4.55.0 |
|
|
- Pytorch 2.6.0+cu124 |
|
|
- Datasets 4.0.0 |
|
|
- Tokenizers 0.21.4 |