---
language: en
license: apache-2.0
tags:
- vision
- image-classification
- vit
- fine-tuned
- transformers
datasets:
- your-dataset-name
model-index:
- name: ViT-Large-Patch16-224 Fine-tuned Model
  results:
  - task:
      name: Image Classification
      type: image-classification
    metrics:
    - name: Validation Loss
      type: loss
      value: 0.3268
---

# Vision Transformer (ViT) Fine-Tuned Model


# Vision Transformer (ViT) Fine-Tuned Model

This repository contains a fine-tuned version of **[google/vit-large-patch16-224](https://huggingface.co/google/vit-large-patch16-224)**, optimized for a custom image classification task.

---

## 📌 Model Overview

- **Base model**: `google/vit-large-patch16-224`
- **Architecture**: Vision Transformer (ViT)
- **Patch size**: 16×16
- **Image resolution**: 224×224
- **Frameworks**: PyTorch, Hugging Face Transformers

---

## 📊 Performance

| Metric | Value |
|--------|-------|
| **Final Validation Loss** | **0.3268** |
| **Lowest Validation Loss** | **0.2548** (Epoch 18) |

Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs.

---

## 🔧 Training Configuration

| Hyperparameter | Value |
|----------------|-------|
| **Learning rate** | `2e-5` |
| **Train batch size** | `20` |
| **Eval batch size** | `8` |
| **Optimizer** | AdamW (`betas=(0.9, 0.999)`, `eps=1e-8`) |
| **LR scheduler** | Linear |
| **Epochs** | `40` |
| **Seed** | `42` |
| **Framework versions** | Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 |

---

## 📂 Training Results

| Epoch | Step | Validation Loss |
|-------|------|-----------------|
| 1     | 24   | 0.5601 |
| 5     | 120  | 0.3421 |
| 10    | 240  | 0.2901 |
| 14    | 336  | 0.2737 |
| 18    | 432  | **0.2548** |
| 40    | 960  | 0.3268 |

---

## 🛠 Intended Uses

- Image classification on datasets with characteristics similar to the training dataset.
- Fine-tuning for domain-specific classification tasks.

---

## ⚠ Limitations

- Trained on a **custom dataset** — may not generalize well to unrelated domains without additional fine-tuning.
- No guarantees on fairness, bias, or ethical implications without dataset analysis.

---

## 🚀 How to Use

You can use this model in two main ways:

### **1️⃣ Using the High-Level `pipeline` API**
```python
from transformers import pipeline

pipe = pipeline("image-classification", model="rakib730/output-models")

# Classify an image from a URL
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
print(result)

2️⃣ Using the Processor and Model Directly**
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
import torch

# Load processor and model
processor = AutoImageProcessor.from_pretrained("rakib730/output-models")
model = AutoModelForImageClassification.from_pretrained("rakib730/output-models")

# Load an image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Preprocess
inputs = processor(images=image, return_tensors="pt")

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_id = logits.argmax(-1).item()

print("Predicted class:", model.config.id2label[predicted_class_id])