|
|
--- |
|
|
base_model: |
|
|
- google/vit-base-patch16-224-in21k |
|
|
library_name: transformers |
|
|
tags: |
|
|
- image-classification |
|
|
- vision-transformer |
|
|
- just-for-fun |
|
|
--- |
|
|
|
|
|
# MaxVision: Max vs. Not Max Classifier |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**MaxVision** is a fun, hobby AI vision classifier designed to distinguish between images of Max, a black and white |
|
|
sprocker spaniel, and all other images. The model has been trained using personal photos of Max and general images of |
|
|
other dogs and non-dog subjects to improve its classification accuracy. It is intended purely for personal and |
|
|
experimental use. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by:** Patrick Skillen |
|
|
- **Use Case:** Identifying whether an image contains Max |
|
|
- **Architecture:** Based on a fine-tuned vision transformer (ViT) |
|
|
- **Training Dataset:** Curated personal dataset of Max and various non-Max images |
|
|
- **Framework:** PyTorch with Hugging Face Transformers |
|
|
- **Training Platform:** Google Colab |
|
|
- **Labels:** |
|
|
- `0`: Max |
|
|
- `1`: Not Max |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is built as a fun, personal experiment in AI/ML and image classification. It is not intended for commercial |
|
|
applications, biometric identification, or general dog breed classification. |
|
|
|
|
|
## Limitations & Biases |
|
|
|
|
|
- The model is heavily biased toward distinguishing Max from non-Max images and is not robust for identifying specific |
|
|
breeds or other dogs. |
|
|
- Performance may degrade on images with low resolution, extreme lighting conditions, or unusual poses. |
|
|
- Limited dataset size and personal image selection may affect generalizability. |
|
|
|
|
|
## How to Use |
|
|
|
|
|
Try it in the HF Space at https://huggingface.co/spaces/paddeh/is-it-max |
|
|
|
|
|
To use the model, you can run inference using the Hugging Face `transformers` or `timm` library, depending on the model |
|
|
backbone. Below is a sample inference script: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline("image-classification", model="paddeh/is-it-max") |
|
|
|
|
|
result = classifier("path/to/image.jpg") |
|
|
print("Max" if prediction.item() == 0 else "Not Max") |
|
|
``` |
|
|
|
|
|
Alternatively, with `torchvision`: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from torchvision import transforms |
|
|
from transformers import ViTForImageClassification, ViTImageProcessor |
|
|
from PIL import Image |
|
|
|
|
|
model = ViTForImageClassification.from_pretrained('model.safetensors') |
|
|
model.eval() |
|
|
processor = ViTImageProcessor.from_pretrained(model_path) |
|
|
|
|
|
transform = transforms.Compose([ |
|
|
transforms.Resize((224, 224)), |
|
|
transforms.ToTensor(), |
|
|
transforms.Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std), |
|
|
]) |
|
|
|
|
|
image = Image.open("path/to/image.jpg") |
|
|
image = transform(image).unsqueeze(0) |
|
|
|
|
|
with torch.no_grad(): |
|
|
output = model(image) |
|
|
|
|
|
prediction = torch.argmax(output, dim=1) |
|
|
print("Max" if prediction.item() == 0 else "Not Max") |
|
|
``` |
|
|
|
|
|
## Model Performance |
|
|
|
|
|
As this is a personal hobby project, there is no formal benchmark, but the model has been tested informally using |
|
|
validation images from Max’s personal collection and various other dog breeds. |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
Since this model is built for personal use, there are no significant ethical concerns. However, users should be mindful |
|
|
of data privacy and not use the model for unauthorized biometric identification of pets or people. |
|
|
|
|
|
## Future Improvements |
|
|
|
|
|
- Expand the dataset with more diverse images of Max in different lighting conditions and settings. |
|
|
- Improve augmentation techniques to enhance robustness. |
|
|
- Fine-tune using more advanced architectures like CLIP or Swin Transformer for better accuracy. |
|
|
|
|
|
--- |
|
|
|
|
|
**Disclaimer:** This model is intended for personal and educational use only. It is not designed for commercial |
|
|
applications or general-purpose image recognition. |
|
|
|