File size: 982 Bytes
1b47735
 
 
 
 
 
 
 
 
 
952ab75
 
 
eaf0c97
 
1b47735
952ab75
1b47735
952ab75
1b47735
952ab75
1b47735
952ab75
1b47735
952ab75
1b47735
952ab75
1b47735
952ab75
1b47735
952ab75
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
license: apache-2.0
base_model: google/vit-base-patch16-224-in21k
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: vit-eGTZANplus
  results: []
datasets:
- ghermoso/egtzan_plus
pipeline_tag: image-classification

    
---
# Vision Transformer (ViT) for Music Genre Classification

## Model Overview

- **Model Name:** [ghermoso/vit-eGTZANplus](https://huggingface.co/ghermoso/vit-eGTZANplus)

- **Task:** Image Classification

- **Dataset:** [egtzan_plus](https://huggingface.co/datasets/ghermoso/egtzan_plus)

- **Model Architecture:** [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)

- **Finetuned from model:** This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an [egtzan_plus](https://huggingface.co/datasets/ghermoso/egtzan_plus) dataset.

It achieves the following results on the evaluation set:
- Loss: 0.8358
- Accuracy: 0.7460