|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: google/vit-base-patch16-224-in21k |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
metrics: |
|
|
- accuracy |
|
|
model-index: |
|
|
- name: vit-eGTZANplus |
|
|
results: [] |
|
|
datasets: |
|
|
- ghermoso/egtzan_plus |
|
|
pipeline_tag: image-classification |
|
|
|
|
|
|
|
|
--- |
|
|
# Vision Transformer (ViT) for Music Genre Classification |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Model Name:** [ghermoso/vit-eGTZANplus](https://huggingface.co/ghermoso/vit-eGTZANplus) |
|
|
|
|
|
- **Task:** Image Classification |
|
|
|
|
|
- **Dataset:** [egtzan_plus](https://huggingface.co/datasets/ghermoso/egtzan_plus) |
|
|
|
|
|
- **Model Architecture:** [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) |
|
|
|
|
|
- **Finetuned from model:** This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an [egtzan_plus](https://huggingface.co/datasets/ghermoso/egtzan_plus) dataset. |
|
|
|
|
|
It achieves the following results on the evaluation set: |
|
|
- Loss: 0.8358 |
|
|
- Accuracy: 0.7460 |