Update README.md

4aed93e verified 5 months ago

4.41 kB

	---
	license: apache-2.0
	tags:
	- image-classification
	- computer-vision
	- checkbox-detection
	- efficientnet
	datasets:
	- wendys-llc/chkbx
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	base_model: google/efficientnet-b0
	model-index:
	- name: checkbox-classifier-efficientnet
	results:
	- task:
	type: image-classification
	name: Image Classification
	dataset:
	type: wendys-llc/chkbx
	name: Checkbox Detection Dataset
	split: validation
	metrics:
	- type: accuracy
	value: 0.97
	name: Validation Accuracy
	library_name: transformers
	pipeline_tag: image-classification
	---

	# Checkbox State Classifier - EfficientNet-B0

	A fine-tuned EfficientNet-B0 model for binary classification of checkbox states (checked/unchecked). This model achieves ~95% accuracy on UI checkbox detection.

	## Model Description

	This model is fine-tuned from [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0) on the [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx) dataset. It's designed to classify UI checkboxes in screenshots and interface images.

	### Key Features
	- No `trust_remote_code` required - Uses native transformers support
	- Fast inference - EfficientNet-B0 is optimized for speed
	- High accuracy - ~95% on validation set
	- Simple API - Works with transformers pipeline out of the box

	## Usage

	### Quick Start with Pipeline (Recommended)

	```python
	from transformers import pipeline
	from PIL import Image

	# Load the model
	classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")

	# Classify an image
	image = Image.open("checkbox.jpg")
	results = classifier(image)

	# Print results
	for result in results:
	print(f"{result['label']}: {result['score']:.2%}")

	# Get just the top prediction
	top_result = classifier(image, top_k=1)[0]
	print(f"Checkbox is: {top_result['label']} (confidence: {top_result['score']:.2%})")
	```

	### Using AutoModel and AutoImageProcessor

	```python
	from transformers import AutoImageProcessor, AutoModelForImageClassification
	import torch
	from PIL import Image

	# Load model and processor
	processor = AutoImageProcessor.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")
	model = AutoModelForImageClassification.from_pretrained("wendys-llc/checkbox-classifier-efficientnet")

	# Prepare image
	image = Image.open("checkbox.jpg")
	inputs = processor(images=image, return_tensors="pt")

	# Get prediction
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits

	# Get predicted class
	predicted_class_idx = logits.argmax(-1).item()
	predicted_label = model.config.id2label[predicted_class_idx]

	# Get confidence scores
	probabilities = torch.nn.functional.softmax(logits, dim=-1)
	confidence = probabilities.max().item()

	print(f"Prediction: {predicted_label} (confidence: {confidence:.2%})")
	```

	### Batch Processing

	```python
	from transformers import pipeline
	from PIL import Image

	classifier = pipeline("image-classification", model="wendys-llc/checkbox-classifier-efficientnet")

	# Process multiple images
	images = [Image.open(f"checkbox_{i}.jpg") for i in range(1, 4)]
	results = classifier(images)

	for i, result in enumerate(results):
	top_pred = result[0] # Get top prediction
	print(f"Image {i+1}: {top_pred['label']} ({top_pred['score']:.2%})")
	```

	## Model Details

	### Architecture
	- Base Model: google/efficientnet-b0
	- Model Type: EfficientNet for Image Classification
	- Number of Labels: 2 (checked, unchecked)
	- Input Size: 224x224 RGB images
	- Framework: PyTorch via Transformers

	### Training Details
	- Dataset: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
	- ~4,800 training samples
	- ~1,200 validation samples
	- Training Configuration:
	- Epochs: 15 (with early stopping)
	- Batch Size: 64 (on A100)
	- Learning Rate: Default AdamW
	- Mixed Precision: FP16
	- Hardware: NVIDIA A100 GPU

	## Acknowledgments

	- Base model: [google/efficientnet-b0](https://huggingface.co/google/efficientnet-b0)
	- Dataset: [wendys-llc/chkbx](https://huggingface.co/datasets/wendys-llc/chkbx)
	- Framework: [HuggingFace Transformers](https://github.com/huggingface/transformers)

	## License

	This model is licensed under the Apache 2.0 License. See the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.