--- language: en license: apache-2.0 tags: - vision - image-classification - vit - fine-tuned - transformers datasets: - your-dataset-name model-index: - name: ViT-Large-Patch16-224 Fine-tuned Model results: - task: name: Image Classification type: image-classification metrics: - name: Validation Loss type: loss value: 0.3268 --- # Vision Transformer (ViT) Fine-Tuned Model # Vision Transformer (ViT) Fine-Tuned Model This repository contains a fine-tuned version of **[google/vit-large-patch16-224](https://huggingface.co/google/vit-large-patch16-224)**, optimized for a custom image classification task. --- ## 📌 Model Overview - **Base model**: `google/vit-large-patch16-224` - **Architecture**: Vision Transformer (ViT) - **Patch size**: 16×16 - **Image resolution**: 224×224 - **Frameworks**: PyTorch, Hugging Face Transformers --- ## 📊 Performance | Metric | Value | |--------|-------| | **Final Validation Loss** | **0.3268** | | **Lowest Validation Loss** | **0.2548** (Epoch 18) | Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs. --- ## 🔧 Training Configuration | Hyperparameter | Value | |----------------|-------| | **Learning rate** | `2e-5` | | **Train batch size** | `20` | | **Eval batch size** | `8` | | **Optimizer** | AdamW (`betas=(0.9, 0.999)`, `eps=1e-8`) | | **LR scheduler** | Linear | | **Epochs** | `40` | | **Seed** | `42` | | **Framework versions** | Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 | --- ## 📂 Training Results | Epoch | Step | Validation Loss | |-------|------|-----------------| | 1 | 24 | 0.5601 | | 5 | 120 | 0.3421 | | 10 | 240 | 0.2901 | | 14 | 336 | 0.2737 | | 18 | 432 | **0.2548** | | 40 | 960 | 0.3268 | --- ## 🛠 Intended Uses - Image classification on datasets with characteristics similar to the training dataset. - Fine-tuning for domain-specific classification tasks. --- ## ⚠ Limitations - Trained on a **custom dataset** — may not generalize well to unrelated domains without additional fine-tuning. - No guarantees on fairness, bias, or ethical implications without dataset analysis. --- ## 🚀 How to Use You can use this model in two main ways: ### **1️⃣ Using the High-Level `pipeline` API** ```python from transformers import pipeline pipe = pipeline("image-classification", model="rakib730/output-models") # Classify an image from a URL result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png") print(result) 2️⃣ Using the Processor and Model Directly** from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image import requests import torch # Load processor and model processor = AutoImageProcessor.from_pretrained("rakib730/output-models") model = AutoModelForImageClassification.from_pretrained("rakib730/output-models") # Load an image url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png" image = Image.open(requests.get(url, stream=True).raw).convert("RGB") # Preprocess inputs = processor(images=image, return_tensors="pt") # Inference with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class_id = logits.argmax(-1).item() print("Predicted class:", model.config.id2label[predicted_class_id])