File size: 4,566 Bytes
5b9ac52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b50b6f2
 
 
 
 
 
5b9ac52
 
 
 
 
 
 
 
 
 
 
 
b50b6f2
5b9ac52
b50b6f2
5b9ac52
b50b6f2
 
 
5b9ac52
b50b6f2
 
 
 
 
5b9ac52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: mit
datasets:
- ArtifactClfDurham/OrientalMuseum-white
language:
- en
base_model:
- google/efficientnet-b0
tags:
- artifact
- museum
---

# Artifact Classification Model v2 - Best Model Usage Guide

This directory contains the improved v2 artifact classification model with state-of-the-art performance for classifying museum artifacts by both object type and material.

## Hosted Model

The best model is available on Hugging Face at: **[SpyC0der77/artifact-efficientnet](https://huggingface.co/SpyC0der77/artifact-efficientnet)**

You can use the model directly from Hugging Face without downloading it locally.

## Model Overview

The v2 model is an advanced multi-output neural network that predicts two attributes simultaneously:
- **Object Name**: The type/category of the artifact (e.g., "vase", "statue", "pottery")
- **Material**: The material composition (e.g., "ceramic", "bronze", "stone")

### Key Improvements Over v1
- **EfficientNet Backbone**: Uses EfficientNet-B0 instead of ResNet-50 for better feature extraction
- **Attention Mechanism**: Includes an attention layer to focus on relevant features
- **Advanced Training**: Incorporates CutMix augmentation, Focal Loss, and mixed precision training
- **Better Regularization**: Uses dropout and batch normalization for improved generalization

## Architecture & Usage

The v2 model uses an EfficientNet-B0 backbone with an attention mechanism for multi-output classification. It processes RGB images of artifacts and outputs predictions for both object type and material composition.

### Input
- **Format**: RGB images (224Γ—224 pixels after preprocessing)
- **Preprocessing**: Resize to 256Γ—256, center crop to 224Γ—224, normalize with ImageNet statistics

### Output
- **Object Classification**: Predicts artifact type (e.g., "vase", "statue", "pottery")
- **Material Classification**: Predicts material composition (e.g., "ceramic", "bronze", "stone")
- **Confidence Scores**: Probability scores for each prediction
- **Format**: Dictionary with 'object_name' and 'material' logits

## Model Architecture

```python
ImprovedMultiOutputModel(
    backbone: EfficientNet-B0 (pretrained)
    attention: Linear(1280 β†’ 512 β†’ 1280) with Sigmoid
    object_classifier: Linear(1280 β†’ 1024 β†’ 512 β†’ num_object_classes)
    material_classifier: Linear(1280 β†’ 1024 β†’ 512 β†’ num_material_classes)
)
```

### Input Requirements
- **Image Size**: 224Γ—224 pixels (automatically resized and cropped)
- **Format**: RGB images
- **Normalization**: ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

### Output Format
Returns a dictionary with:
- `'object_name'`: Logits for object classification
- `'material'`: Logits for material classification

## Training Details

The model was trained with the following configuration:

- **Dataset**: ArtifactClfDurham/OrientalMuseum-white
- **Training Split**: 85% of data
- **Validation Split**: 15% of data
- **Batch Size**: 32
- **Epochs**: 20
- **Optimizer**: AdamW with differential learning rates
  - Backbone: 2e-4 (0.1Γ— base LR)
  - Heads: 2e-3 (base LR)
- **Augmentation**: Advanced (CutMix, rotation, color jitter, Gaussian blur)
- **Loss Function**: Cross-Entropy (or Focal Loss if enabled)
- **Scheduler**: Cosine annealing with warmup

### Advanced Training Features

- **CutMix Augmentation**: Randomly mixes image patches between samples
- **Focal Loss**: Addresses class imbalance (optional)
- **Mixed Precision**: Automatic mixed precision training for speed
- **Gradient Scaling**: Prevents gradient underflow
- **Early Stopping**: Saves best model based on validation accuracy

## Troubleshooting

### Common Issues

1. **CUDA Out of Memory**
   - Reduce batch size: `--batch_size 8`
   - Use CPU: Set device to "cpu"

2. **Import Errors**
   - Ensure all dependencies are installed
   - Check Python path includes project root

3. **Model Loading Errors**
   - Verify the model file path is correct
   - Ensure PyTorch version compatibility

4. **Low Confidence Scores**
   - Model may not be trained on similar artifacts
   - Check image preprocessing matches training setup

### Performance Tips

- Use GPU for faster inference
- Process images in batches for efficiency
- Consider model quantization for deployment

## Model Limitations

- Trained specifically on Oriental Museum artifacts
- May not generalize well to artifacts from other cultures/regions
- Performance depends on image quality and lighting
- Multi-output nature may have trade-offs between object and material accuracy