DejanX13 commited on
Commit
2975692
·
verified ·
1 Parent(s): 1893235

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: sr
3
+ license: apache-2.0
4
+ tags:
5
+ - image-classification
6
+ - vision
7
+ - vit
8
+ - house-condition
9
+ datasets:
10
+ - custom
11
+ metrics:
12
+ - accuracy
13
+ ---
14
+
15
+ # Fine-tuned ViT for House Condition Classification
16
+
17
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) for classifying house conditions into 4 categories.
18
+
19
+ ## Model Description
20
+
21
+ This Vision Transformer (ViT) model has been fine-tuned to classify house images into four condition categories:
22
+ - **dobre** (good condition)
23
+ - **nepoznato** (unknown condition)
24
+ - **oronule** (dilapidated condition)
25
+ - **srednje** (medium condition)
26
+
27
+ ## Training Details
28
+
29
+ ### Training Data
30
+ - Training set: 757 images
31
+ - Validation set: 80 images
32
+ - Test set: 79 images
33
+
34
+ ### Training Hyperparameters
35
+ - Epochs: 10
36
+ - Batch size: 16
37
+ - Learning rate: 2e-5
38
+ - Optimizer: AdamW
39
+ - Seed: 42 (for reproducibility)
40
+
41
+ ## Evaluation Results
42
+
43
+ ### Validation Set Performance
44
+ - **Accuracy**: 80.0%
45
+ - **Loss**: 0.7827
46
+
47
+ ### Per-Class Metrics (Validation)
48
+ | Class | Precision | Recall | F1-Score | Support |
49
+ |------------|-----------|--------|----------|----------|
50
+ | dobre | 0.83 | 0.50 | 0.62 | 10 |
51
+ | nepoznato | 1.00 | 0.83 | 0.91 | 24 |
52
+ | oronule | 0.71 | 0.80 | 0.75 | 15 |
53
+ | srednje | 0.73 | 0.87 | 0.79 | 31 |
54
+
55
+ ### Confusion Matrix (Validation)
56
+ ```
57
+ [[ 5 0 0 5] # dobre
58
+ [ 1 20 1 2] # nepoznato
59
+ [ 0 0 12 3] # oronule
60
+ [ 0 0 4 27]] # srednje
61
+ ```
62
+
63
+ ## Usage
64
+
65
+ ```python
66
+ from transformers import ViTForImageClassification, ViTImageProcessor
67
+ from PIL import Image
68
+ import torch
69
+
70
+ # Load model and processor
71
+ model = ViTForImageClassification.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
72
+ processor = ViTImageProcessor.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
73
+
74
+ # Load and preprocess image
75
+ image = Image.open("path_to_image.jpg").convert("RGB")
76
+ inputs = processor(image, return_tensors="pt")
77
+
78
+ # Make prediction
79
+ with torch.no_grad():
80
+ outputs = model(**inputs)
81
+
82
+ predicted_class_idx = outputs.logits.argmax(-1).item()
83
+ predicted_label = model.config.id2label[str(predicted_class_idx)]
84
+
85
+ print(f"Predicted class: {predicted_label}")
86
+
87
+ # Get probabilities
88
+ probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
89
+ for idx, prob in enumerate(probs):
90
+ label = model.config.id2label[str(idx)]
91
+ print(f"{label}: {prob.item():.2%}")
92
+ ```
93
+
94
+ ## Limitations and Bias
95
+
96
+ - The model was trained on a specific dataset of house images and may not generalize well to different architectural styles or regions
97
+ - Performance varies by class, with lower recall for the "dobre" (good condition) class
98
+ - The model may have difficulty distinguishing between similar condition categories
99
+ - Training set is relatively small (757 images)
100
+
101
+ ## Training Procedure
102
+
103
+ The model was fine-tuned using the Hugging Face Transformers library with the following approach:
104
+ 1. Pre-trained weights from google/vit-base-patch16-224-in21k were used as initialization
105
+ 2. The classification head was replaced with a new 4-class classifier
106
+ 3. All model parameters were fine-tuned on the custom dataset
107
+ 4. Early stopping and checkpoint saving were employed to prevent overfitting
108
+ 5. Images were converted to RGB to ensure consistent 3-channel input
109
+
110
+ ## Citation
111
+
112
+ If you use this model, please cite:
113
+
114
+ ```bibtex
115
+ @misc{house-condition-vit,
116
+ author = {Your Name},
117
+ title = {Fine-tuned ViT for House Condition Classification},
118
+ year = {2025},
119
+ publisher = {Hugging Face},
120
+ howpublished = {\url{https://huggingface.co/YOUR_USERNAME/YOUR_MODEL_NAME}}
121
+ }
122
+ ```
123
+
124
+ ## Model Card Authors
125
+
126
+ This model card was created by the model author.