|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- ArtifactClfDurham/OrientalMuseum-white |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- google/efficientnet-b0 |
|
|
tags: |
|
|
- artifact |
|
|
- museum |
|
|
--- |
|
|
|
|
|
# Artifact Classification Model v2 - Best Model Usage Guide |
|
|
|
|
|
This directory contains the improved v2 artifact classification model with state-of-the-art performance for classifying museum artifacts by both object type and material. |
|
|
|
|
|
## Hosted Model |
|
|
|
|
|
The best model is available on Hugging Face at: **[SpyC0der77/artifact-efficientnet](https://huggingface.co/SpyC0der77/artifact-efficientnet)** |
|
|
|
|
|
You can use the model directly from Hugging Face without downloading it locally. |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
The v2 model is an advanced multi-output neural network that predicts two attributes simultaneously: |
|
|
- **Object Name**: The type/category of the artifact (e.g., "vase", "statue", "pottery") |
|
|
- **Material**: The material composition (e.g., "ceramic", "bronze", "stone") |
|
|
|
|
|
### Key Improvements Over v1 |
|
|
- **EfficientNet Backbone**: Uses EfficientNet-B0 instead of ResNet-50 for better feature extraction |
|
|
- **Attention Mechanism**: Includes an attention layer to focus on relevant features |
|
|
- **Advanced Training**: Incorporates CutMix augmentation, Focal Loss, and mixed precision training |
|
|
- **Better Regularization**: Uses dropout and batch normalization for improved generalization |
|
|
|
|
|
## Architecture & Usage |
|
|
|
|
|
The v2 model uses an EfficientNet-B0 backbone with an attention mechanism for multi-output classification. It processes RGB images of artifacts and outputs predictions for both object type and material composition. |
|
|
|
|
|
### Input |
|
|
- **Format**: RGB images (224Γ224 pixels after preprocessing) |
|
|
- **Preprocessing**: Resize to 256Γ256, center crop to 224Γ224, normalize with ImageNet statistics |
|
|
|
|
|
### Output |
|
|
- **Object Classification**: Predicts artifact type (e.g., "vase", "statue", "pottery") |
|
|
- **Material Classification**: Predicts material composition (e.g., "ceramic", "bronze", "stone") |
|
|
- **Confidence Scores**: Probability scores for each prediction |
|
|
- **Format**: Dictionary with 'object_name' and 'material' logits |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
```python |
|
|
ImprovedMultiOutputModel( |
|
|
backbone: EfficientNet-B0 (pretrained) |
|
|
attention: Linear(1280 β 512 β 1280) with Sigmoid |
|
|
object_classifier: Linear(1280 β 1024 β 512 β num_object_classes) |
|
|
material_classifier: Linear(1280 β 1024 β 512 β num_material_classes) |
|
|
) |
|
|
``` |
|
|
|
|
|
### Input Requirements |
|
|
- **Image Size**: 224Γ224 pixels (automatically resized and cropped) |
|
|
- **Format**: RGB images |
|
|
- **Normalization**: ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) |
|
|
|
|
|
### Output Format |
|
|
Returns a dictionary with: |
|
|
- `'object_name'`: Logits for object classification |
|
|
- `'material'`: Logits for material classification |
|
|
|
|
|
## Training Details |
|
|
|
|
|
The model was trained with the following configuration: |
|
|
|
|
|
- **Dataset**: ArtifactClfDurham/OrientalMuseum-white |
|
|
- **Training Split**: 85% of data |
|
|
- **Validation Split**: 15% of data |
|
|
- **Batch Size**: 32 |
|
|
- **Epochs**: 20 |
|
|
- **Optimizer**: AdamW with differential learning rates |
|
|
- Backbone: 2e-4 (0.1Γ base LR) |
|
|
- Heads: 2e-3 (base LR) |
|
|
- **Augmentation**: Advanced (CutMix, rotation, color jitter, Gaussian blur) |
|
|
- **Loss Function**: Cross-Entropy (or Focal Loss if enabled) |
|
|
- **Scheduler**: Cosine annealing with warmup |
|
|
|
|
|
### Advanced Training Features |
|
|
|
|
|
- **CutMix Augmentation**: Randomly mixes image patches between samples |
|
|
- **Focal Loss**: Addresses class imbalance (optional) |
|
|
- **Mixed Precision**: Automatic mixed precision training for speed |
|
|
- **Gradient Scaling**: Prevents gradient underflow |
|
|
- **Early Stopping**: Saves best model based on validation accuracy |
|
|
|
|
|
## Troubleshooting |
|
|
|
|
|
### Common Issues |
|
|
|
|
|
1. **CUDA Out of Memory** |
|
|
- Reduce batch size: `--batch_size 8` |
|
|
- Use CPU: Set device to "cpu" |
|
|
|
|
|
2. **Import Errors** |
|
|
- Ensure all dependencies are installed |
|
|
- Check Python path includes project root |
|
|
|
|
|
3. **Model Loading Errors** |
|
|
- Verify the model file path is correct |
|
|
- Ensure PyTorch version compatibility |
|
|
|
|
|
4. **Low Confidence Scores** |
|
|
- Model may not be trained on similar artifacts |
|
|
- Check image preprocessing matches training setup |
|
|
|
|
|
### Performance Tips |
|
|
|
|
|
- Use GPU for faster inference |
|
|
- Process images in batches for efficiency |
|
|
- Consider model quantization for deployment |
|
|
|
|
|
## Model Limitations |
|
|
|
|
|
- Trained specifically on Oriental Museum artifacts |
|
|
- May not generalize well to artifacts from other cultures/regions |
|
|
- Performance depends on image quality and lighting |
|
|
- Multi-output nature may have trade-offs between object and material accuracy |
|
|
|