Granitagushi commited on
Commit
a2b9220
·
verified ·
1 Parent(s): 9291f82

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +40 -0
  2. app.py +42 -0
  3. requirements.txt +2 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLIP Zero-Shot Classification on Oxford Pets Dataset
2
+
3
+ ## Model Details
4
+ - **Model Name**: CLIP (Contrastive Language-Image Pre-training)
5
+ - **Model Version**: openai/clip-vit-large-patch14
6
+ - **Task**: Zero-shot Image Classification
7
+ - **Dataset**: Oxford-IIIT Pet Dataset
8
+
9
+ ## Evaluation Results
10
+ The model was evaluated on the Oxford Pets dataset using zero-shot classification. The following metrics were obtained:
11
+
12
+ - **Accuracy**: 0.8800 -> 88%
13
+ - **Precision**: 0.8768 -> 87.68%
14
+ - **Recall**: 0.8800 -> 88%
15
+
16
+ ## Model Description
17
+ CLIP (Contrastive Language-Image Pre-training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to perform a great variety of classification benchmarks, without directly optimizing for the benchmark's performance. This zero-shot capability of CLIP is particularly useful for tasks where labeled data is scarce or expensive to obtain.
18
+
19
+ ## Dataset
20
+ The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed.
21
+
22
+ ## Usage
23
+ ```python
24
+ from transformers import pipeline
25
+
26
+ # Load the model
27
+ checkpoint = "openai/clip-vit-large-patch14"
28
+ detector = pipeline(model=checkpoint, task="zero-shot-image-classification")
29
+
30
+ # Define candidate labels
31
+ labels = ['Siamese', 'Birman', 'shiba inu', 'staffordshire bull terrier', ...]
32
+
33
+ # Run inference
34
+ results = detector(image, candidate_labels=labels)
35
+ ```
36
+
37
+ ## Limitations
38
+ - The model's performance may vary depending on the quality and characteristics of the input images
39
+ - Zero-shot classification may not perform as well as fine-tuned models on specific tasks
40
+ - The model's predictions are based on the provided candidate labels, so the quality of results depends on the relevance and completeness of these labels
app.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from transformers import pipeline
3
+
4
+ # Load models
5
+ vit_classifier = pipeline("image-classification", model="kuhs/vit-base-oxford-iiit-pets")
6
+ clip_detector = pipeline(model="openai/clip-vit-large-patch14", task="zero-shot-image-classification")
7
+
8
+ labels_oxford_pets = [
9
+ 'Siamese', 'Birman', 'shiba inu', 'staffordshire bull terrier', 'basset hound', 'Bombay', 'japanese chin',
10
+ 'chihuahua', 'german shorthaired', 'pomeranian', 'beagle', 'english cocker spaniel', 'american pit bull terrier',
11
+ 'Ragdoll', 'Persian', 'Egyptian Mau', 'miniature pinscher', 'Sphynx', 'Maine Coon', 'keeshond', 'yorkshire terrier',
12
+ 'havanese', 'leonberger', 'wheaten terrier', 'american bulldog', 'english setter', 'boxer', 'newfoundland', 'Bengal',
13
+ 'samoyed', 'British Shorthair', 'great pyrenees', 'Abyssinian', 'pug', 'saint bernard', 'Russian Blue', 'scottish terrier'
14
+ ]
15
+
16
+ def classify_pet(image):
17
+ vit_results = vit_classifier(image)
18
+ vit_output = {result['label']: result['score'] for result in vit_results}
19
+
20
+ clip_results = clip_detector(image, candidate_labels=labels_oxford_pets)
21
+ clip_output = {result['label']: result['score'] for result in clip_results}
22
+
23
+ return {"ViT Classification": vit_output, "CLIP Zero-Shot Classification": clip_output}
24
+
25
+ example_images = [
26
+ ["example_images/dog1.jpeg"],
27
+ ["example_images/dog2.jpeg"],
28
+ ["example_images/leonberger.jpg"],
29
+ ["example_images/snow_leopard.jpeg"],
30
+ ["example_images/cat.jpg"]
31
+ ]
32
+
33
+ iface = gr.Interface(
34
+ fn=classify_pet,
35
+ inputs=gr.Image(type="filepath"),
36
+ outputs=gr.JSON(),
37
+ title="Pet Classification Comparison",
38
+ description="Upload an image of a pet, and compare results from a trained ViT model and a zero-shot CLIP model.",
39
+ examples=example_images
40
+ )
41
+
42
+ iface.launch()
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ transformers
2
+ torch