DejanX13
/

vit-house-classifier

+---
+language: sr
+license: apache-2.0
+tags:
+- image-classification
+- vision
+- vit
+- house-condition
+datasets:
+- custom
+metrics:
+- accuracy
+---
+# Fine-tuned ViT for House Condition Classification
+This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) for classifying house conditions into 4 categories.
+## Model Description
+This Vision Transformer (ViT) model has been fine-tuned to classify house images into four condition categories:
+- **dobre** (good condition)
+- **nepoznato** (unknown condition)
+- **oronule** (dilapidated condition)
+- **srednje** (medium condition)
+## Training Details
+### Training Data
+- Training set: 757 images
+- Validation set: 80 images
+- Test set: 79 images
+### Training Hyperparameters
+- Epochs: 10
+- Batch size: 16
+- Learning rate: 2e-5
+- Optimizer: AdamW
+- Seed: 42 (for reproducibility)
+## Evaluation Results
+### Validation Set Performance
+- **Accuracy**: 80.0%
+- **Loss**: 0.7827
+### Per-Class Metrics (Validation)
+| Class      | Precision | Recall | F1-Score | Support |
+|------------|-----------|--------|----------|----------|
+| dobre      | 0.83      | 0.50   | 0.62     | 10       |
+| nepoznato  | 1.00      | 0.83   | 0.91     | 24       |
+| oronule    | 0.71      | 0.80   | 0.75     | 15       |
+| srednje    | 0.73      | 0.87   | 0.79     | 31       |
+### Confusion Matrix (Validation)
+```
+[[ 5  0  0  5]    # dobre
+ [ 1 20  1  2]    # nepoznato
+ [ 0  0 12  3]    # oronule
+ [ 0  0  4 27]]   # srednje
+```
+## Usage
+```python
+from transformers import ViTForImageClassification, ViTImageProcessor
+from PIL import Image
+import torch
+# Load model and processor
+model = ViTForImageClassification.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
+processor = ViTImageProcessor.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
+# Load and preprocess image
+image = Image.open("path_to_image.jpg").convert("RGB")
+inputs = processor(image, return_tensors="pt")
+# Make prediction
+with torch.no_grad():
+    outputs = model(**inputs)
+predicted_class_idx = outputs.logits.argmax(-1).item()
+predicted_label = model.config.id2label[str(predicted_class_idx)]
+print(f"Predicted class: {predicted_label}")
+# Get probabilities
+probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
+for idx, prob in enumerate(probs):
+    label = model.config.id2label[str(idx)]
+    print(f"{label}: {prob.item():.2%}")
+```
+## Limitations and Bias
+- The model was trained on a specific dataset of house images and may not generalize well to different architectural styles or regions
+- Performance varies by class, with lower recall for the "dobre" (good condition) class
+- The model may have difficulty distinguishing between similar condition categories
+- Training set is relatively small (757 images)
+## Training Procedure
+The model was fine-tuned using the Hugging Face Transformers library with the following approach:
+1. Pre-trained weights from google/vit-base-patch16-224-in21k were used as initialization
+2. The classification head was replaced with a new 4-class classifier
+3. All model parameters were fine-tuned on the custom dataset
+4. Early stopping and checkpoint saving were employed to prevent overfitting
+5. Images were converted to RGB to ensure consistent 3-channel input
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{house-condition-vit,
+  author = {Your Name},
+  title = {Fine-tuned ViT for House Condition Classification},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/YOUR_MODEL_NAME}}
+}
+```
+## Model Card Authors
+This model card was created by the model author.