|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- image-classification |
|
|
- computer-vision |
|
|
- checkbox-detection |
|
|
- efficientnet |
|
|
datasets: |
|
|
- wendys-llc/chkbx |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
base_model: google/efficientnet-b0 |
|
|
model-index: |
|
|
- name: checkbox-classifier-efficientnet |
|
|
results: |
|
|
- task: |
|
|
type: image-classification |
|
|
name: Image Classification |
|
|
dataset: |
|
|
type: wendys-llc/chkbx |
|
|
name: Checkbox Detection Dataset |
|
|
split: validation |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.97 |
|
|
name: Validation Accuracy |
|
|
library_name: transformers |
|
|
pipeline_tag: image-classification |
|
|
--- |
|
|
|
|
|
# Checkbox State Classifier - EfficientNet-B0 |
|
|
|
|
|
A fine-tuned EfficientNet-B0 model for binary classification of checkbox states (checked/unchecked). This model achieves ~95% accuracy on UI checkbox detection. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is fine-tuned from [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0) on the [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx) dataset. It's designed to classify UI checkboxes in screenshots and interface images. |
|
|
|
|
|
### Key Features |
|
|
- **No `trust_remote_code` required** - Uses native transformers support |
|
|
- **Fast inference** - EfficientNet-B0 is optimized for speed |
|
|
- **High accuracy** - ~95% on validation set |
|
|
- **Simple API** - Works with transformers pipeline out of the box |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Quick Start with Pipeline (Recommended) |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
from PIL import Image |
|
|
|
|
|
# Load the model |
|
|
classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet") |
|
|
|
|
|
# Classify an image |
|
|
image = Image.open("checkbox.jpg") |
|
|
results = classifier(image) |
|
|
|
|
|
# Print results |
|
|
for result in results: |
|
|
print(f"{result['label']}: {result['score']:.2%}") |
|
|
|
|
|
# Get just the top prediction |
|
|
top_result = classifier(image, top_k=1)[0] |
|
|
print(f"Checkbox is: {top_result['label']} (confidence: {top_result['score']:.2%})") |
|
|
``` |
|
|
|
|
|
### Using AutoModel and AutoImageProcessor |
|
|
|
|
|
```python |
|
|
from transformers import AutoImageProcessor, AutoModelForImageClassification |
|
|
import torch |
|
|
from PIL import Image |
|
|
|
|
|
# Load model and processor |
|
|
processor = AutoImageProcessor.from_pretrained("wendys-llc/checkbox-classifier-efficientnet") |
|
|
model = AutoModelForImageClassification.from_pretrained("wendys-llc/checkbox-classifier-efficientnet") |
|
|
|
|
|
# Prepare image |
|
|
image = Image.open("checkbox.jpg") |
|
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
|
|
# Get prediction |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
|
|
|
# Get predicted class |
|
|
predicted_class_idx = logits.argmax(-1).item() |
|
|
predicted_label = model.config.id2label[predicted_class_idx] |
|
|
|
|
|
# Get confidence scores |
|
|
probabilities = torch.nn.functional.softmax(logits, dim=-1) |
|
|
confidence = probabilities.max().item() |
|
|
|
|
|
print(f"Prediction: {predicted_label} (confidence: {confidence:.2%})") |
|
|
``` |
|
|
|
|
|
### Batch Processing |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
from PIL import Image |
|
|
|
|
|
classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet") |
|
|
|
|
|
# Process multiple images |
|
|
images = [Image.open(f"checkbox_{i}.jpg") for i in range(1, 4)] |
|
|
results = classifier(images) |
|
|
|
|
|
for i, result in enumerate(results): |
|
|
top_pred = result[0] # Get top prediction |
|
|
print(f"Image {i+1}: {top_pred['label']} ({top_pred['score']:.2%})") |
|
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Architecture |
|
|
- **Base Model**: google/efficientnet-b0 |
|
|
- **Model Type**: EfficientNet for Image Classification |
|
|
- **Number of Labels**: 2 (checked, unchecked) |
|
|
- **Input Size**: 224x224 RGB images |
|
|
- **Framework**: PyTorch via Transformers |
|
|
|
|
|
### Training Details |
|
|
- **Dataset**: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx) |
|
|
- ~4,800 training samples |
|
|
- ~1,200 validation samples |
|
|
- **Training Configuration**: |
|
|
- Epochs: 15 (with early stopping) |
|
|
- Batch Size: 64 (on A100) |
|
|
- Learning Rate: Default AdamW |
|
|
- Mixed Precision: FP16 |
|
|
- Hardware: NVIDIA A100 GPU |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0) |
|
|
- Dataset: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx) |
|
|
- Framework: [HuggingFace Transformers](https://github.com/huggingface/transformers) |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the Apache 2.0 License. See the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details. |