File size: 5,094 Bytes
2975692
12aa02c
2975692
 
 
 
 
 
 
 
 
 
a6c60ad
 
2975692
 
 
 
 
 
 
 
 
a6c60ad
 
 
 
2975692
 
 
 
a6c60ad
 
 
 
 
2975692
 
a6c60ad
 
 
 
 
 
 
2975692
 
 
 
a6c60ad
 
 
 
 
2975692
 
a6c60ad
2975692
a6c60ad
 
 
 
 
 
 
 
 
 
2975692
 
a6c60ad
2975692
a6c60ad
 
 
 
 
 
2975692
 
a6c60ad
 
 
 
 
 
2975692
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6c60ad
2975692
a6c60ad
 
2975692
 
 
 
a6c60ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2975692
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6c60ad
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
language: eng
license: apache-2.0
tags:
- image-classification
- vision
- vit
- house-condition
datasets:
- custom
metrics:
- accuracy
library_name: transformers
pipeline_tag: image-classification
---

# Fine-tuned ViT for House Condition Classification

This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) for classifying house conditions into 4 categories.

## Model Description

This Vision Transformer (ViT) model has been fine-tuned to classify house images into four condition categories:
- **good** (dobre)
- **unknown** (nepoznato)
- **ruined** (oronule)
- **medium** (srednje)

## Training Details

### Training Data
- **Total dataset**: 935 images
- **Training set**: 776 images
- **Validation set**: 80 images
- **Test set**: 79 images
- **Classes**: 4 (dobre, nepoznato, oronule, srednje)

### Training Hyperparameters
- **Epochs**: 10.0
- **Batch size**: 16 per device
- **Learning rate**: 2e-5
- **Optimizer**: AdamW
- **Seed**: 42 (for reproducibility)
- **Training time**: 5m 45s
- **Samples per second**: 22.43

## Evaluation Results

### Validation Set Performance
- **Accuracy**: 81.2%
- **Loss**: 0.5629

### Training Set Performance
- **Final Training Loss**: 0.5295

### Per-Class Metrics (Validation)

| Class      | Precision | Recall | F1-Score | Support |
|------------|-----------|--------|----------|---------|
| good       | 0.78      | 0.70   | 0.74     | 10      |
| unknown    | 1.00      | 0.83   | 0.91     | 24      |
| ruined     | 0.62      | 1.00   | 0.77     | 15      |
| medium     | 0.85      | 0.74   | 0.79     | 31      |

**Overall Metrics:**
- Accuracy: 81.0% (65/80 correct)
- Macro Average: Precision=0.81, Recall=0.82, F1=0.80
- Weighted Average: Precision=0.84, Recall=0.81, F1=0.82

### Confusion Matrix (Validation)

```
              Predicted →
           good  unknown  ruined  medium
good       [  7      0      0      3 ]
unknown    [  1     20      2      1 ]
ruined     [  0      0     15      0 ]
medium     [  1      0      7     23 ]
```

**Key Insights:**
- 'unknown' class has perfect precision (1.00) - no false positives
- 'ruined' class has perfect recall (1.00) - catches all ruined houses
- Main confusion: 'medium' condition sometimes mistaken for 'ruined' (7 cases)
- 'good' houses occasionally misclassified as 'medium' (3 cases)

## Usage

```python
from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image
import torch

# Load model and processor
model = ViTForImageClassification.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
processor = ViTImageProcessor.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")

# Load and preprocess image
image = Image.open("path_to_image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt")

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)

predicted_class_idx = outputs.logits.argmax(-1).item()
predicted_label = model.config.id2label[str(predicted_class_idx)]

print(f"Predicted class: {predicted_label}")

# Get probabilities
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
for idx, prob in enumerate(probs):
    label = model.config.id2label[str(idx)]
    print(f"{label}: {prob.item():.2%}")
```

## Limitations and Bias

- The model was trained on a specific dataset of house images and may not generalize well to different architectural styles or regions
- Performance varies by class - see validation metrics for details
- The model may have difficulty distinguishing between similar condition categories
- Dataset size: 935 images (relatively small for deep learning)
- Images are from a specific geographical/architectural context

## Training Procedure

The model was fine-tuned using the Hugging Face Transformers library with the following approach:

1. **Pre-trained weights**: Initialized from google/vit-base-patch16-224-in21k
2. **Classification head**: Replaced with a new 4-class classifier
3. **Fine-tuning**: All model parameters were fine-tuned on the custom dataset
4. **Data preprocessing**: Images converted to RGB to ensure consistent 3-channel input
5. **Evaluation strategy**: Evaluated every 50 steps with checkpoint saving
6. **Best model selection**: Best model automatically loaded based on validation performance

## Base Model

[google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k)

Vision Transformer (ViT) model pre-trained on ImageNet-21k at resolution 224x224.

## Framework Versions

- Transformers: 4.57.1
- PyTorch: 2.x
- Datasets: 3.x
- Python: 3.13

## Citation

If you use this model, please cite:

```bibtex
@misc{house-condition-vit,
  author = {Your Name},
  title = {Fine-tuned ViT for House Condition Classification},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/YOUR_MODEL_NAME}}
}
```

## Model Card Authors

This model card was created by the model author.

## Additional Information

- Repository: [GitHub Repository URL]
- Contact: [Your Email or Contact]