sanvo's picture
Upload README.md with huggingface_hub
9a30753 verified
---
language: vi
license: mit
tags:
- vision-transformer
- image-classification
- vietnamese
- scene-classification
- pytorch
- transformers
base_model: google/vit-base-patch16-224
datasets:
- custom
pipeline_tag: image-classification
---
# Vietnamese Scene Classification with Vision Transformer
Fine-tuned [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) (86M parameters) for Vietnamese scene classification.
## Model Details
- **Architecture:** Vision Transformer (ViT-Base, patch size 16, 224×224 input)
- **Parameters:** 86M
- **Base Model:** google/vit-base-patch16-224 (ImageNet-21k pretrained)
- **Task:** Multi-class Vietnamese scene classification
- **Framework:** PyTorch + HuggingFace Transformers
## Training
- **Data Augmentation:** RandomResizedCrop, RandomHorizontalFlip, ColorJitter
- **Optimizer:** AdamW (lr=3e-5, weight_decay=0.01)
- **Scheduler:** OneCycleLR with cosine annealing
- **Gradient Clipping:** max_norm=1.0
- **Validation Accuracy:** 94%+
## Scene Classes (Vietnamese)
| English | Vietnamese |
|---------|-----------|
| Beach | Bãi biển |
| City | Thành phố |
| Forest | Rừng |
| Mountain | Núi |
| Rice Field | Ruộng lúa |
| Market | Chợ |
| Temple | Chùa |
| River | Sông |
## Features
- Attention rollout visualization across 12 transformer layers
- Per-class precision, recall, and F1 metrics
- Formatted confusion matrix output
## Usage
```python
from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image
model = ViTForImageClassification.from_pretrained("sanvo/vietnamese-vit-classification")
processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
image = Image.open("scene.jpg")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_class])
```
## Links
- **GitHub:** [svn05/vietnamese-vit-classification](https://github.com/svn05/vietnamese-vit-classification)