saadmann18 commited on
Commit
97e38b1
·
1 Parent(s): 980bcf6

initial commit

Browse files
Files changed (4) hide show
  1. .gitignore +1 -0
  2. README.md +138 -0
  3. app.py +47 -0
  4. requirements.txt +9 -0
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ venv
README.md CHANGED
@@ -11,3 +11,141 @@ short_description: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ # Fashion Item Classifier
16
+
17
+ A Gradio-based web application that classifies fashion items from image URLs using CLIP (Contrastive Language-Image Pre-training) model.
18
+
19
+ ## Steps to Create This Hugging Face Space
20
+
21
+ Based on the guide from [Marqo's blog post](https://www.marqo.ai/blog/how-to-create-a-hugging-face-space), here are the steps followed:
22
+
23
+ ### 1. Create an Account
24
+ - Head to [Hugging Face](https://huggingface.co/) and create an account
25
+ - Follow the sign-up process with your details
26
+
27
+ ### 2. Confirm Your Email Address
28
+ - Check your email to confirm your account
29
+ - This enables access to all Hugging Face features, including Spaces
30
+
31
+ ### 3. Head to Spaces
32
+ - After confirming email, log in and click on **Spaces** in the main navigation bar
33
+ - This is where you manage and deploy your models and apps
34
+
35
+ ### 4. Create a New Space
36
+ - Click **Create New Space**
37
+ - Configure the following settings:
38
+ - **Owner**: Your Hugging Face account name
39
+ - **Space name**: Choose a descriptive name (e.g., 'fashion-classifier')
40
+ - **Short Description**: Optional description of your project
41
+ - **License**: Optional
42
+ - **Space SDK**: Select **Gradio**
43
+ - **Gradio template**: Keep as **Blank**
44
+ - **Space hardware**: Use **CPU basic • 2 CPU • 16 GB • FREE** for free tier
45
+ - **Privacy**: Select **Public** to share with others
46
+ - Click **Create Space**
47
+
48
+ ### 5. Install Git
49
+ - If you don't have Git, download it from [Git's official page](https://git-scm.com/downloads)
50
+ - Install Git for your operating system
51
+ - Verify installation by running: `git --version`
52
+
53
+ ### 6. Clone the Hugging Face Space
54
+ ```bash
55
+ git clone https://huggingface.co/spaces/your-username/your-space
56
+ ```
57
+ Replace `your-username` and `your-space` with your actual username and space name.
58
+
59
+ ### 7. Open the Folder in VSCode
60
+ - Navigate to the cloned folder
61
+ - Open it in Visual Studio Code (VSCode)
62
+ - Initially, you'll only have `.gitattributes` and `README.md` files
63
+
64
+ ### 8. Create an app.py File
65
+ - Create a new file named `app.py` in VSCode
66
+ - This contains the main application code for your fashion item classifier
67
+
68
+ ### 9. Add Dependencies
69
+ - Create a `requirements.txt` file
70
+ - List all required Python packages for your application
71
+
72
+ ### 10. Test Your App Locally
73
+ Create a virtual environment and test locally:
74
+ ```bash
75
+ # Create virtual environment
76
+ python -m venv venv
77
+
78
+ # Activate virtual environment
79
+ # On Windows:
80
+ venv\Scripts\activate
81
+ # On macOS/Linux:
82
+ source venv/bin/activate
83
+
84
+ # Install dependencies
85
+ pip install -r requirements.txt
86
+
87
+ # Run the app
88
+ python app.py
89
+ ```
90
+
91
+ ### 11. Upload to Hugging Face Hub
92
+ - Create a `.gitignore` file to exclude unnecessary files (like `venv/`)
93
+ - Commit and push your code:
94
+ ```bash
95
+ git add .
96
+ git commit -m "Initial commit"
97
+ git push origin main
98
+ ```
99
+
100
+ ## Development Challenges and Solutions
101
+
102
+ ### Problem 1: PyTorch Meta Tensor Error
103
+ **Issue**: The original `Marqo/marqo-fashionSigLIP` model encountered a meta tensor error:
104
+ ```
105
+ NotImplementedError: Cannot copy out of meta tensor; no data!
106
+ Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to()
107
+ when moving module from meta to a different device.
108
+ ```
109
+
110
+ **Root Cause**: This error occurred due to compatibility issues between the custom SigLIP model and newer versions of PyTorch/transformers. The model was being initialized with meta tensors (tensors without actual data) and the `open_clip` library was trying to move them to a device using the deprecated `.to()` method.
111
+
112
+ **Attempted Solutions**:
113
+ 1. **Environment Variables**: Tried setting `PYTORCH_CUDA_ALLOC_CONF` and disabling meta device initialization
114
+ 2. **Model Parameters**: Attempted using `torch_dtype=torch.float32`, `device_map="cpu"`, and `low_cpu_mem_usage=False`
115
+ 3. **Accelerate Library**: Installed the `accelerate` library as required by the error messages
116
+ 4. **PyTorch Version Downgrade**: Attempted to downgrade PyTorch to version 2.1.0 (not available for Windows)
117
+
118
+ **Final Solution**: Replaced the problematic model with the standard OpenAI CLIP model:
119
+ - **Original Model**: `Marqo/marqo-fashionSigLIP` (custom SigLIP implementation)
120
+ - **Final Model**: `openai/clip-vit-base-patch32` (standard CLIP model)
121
+
122
+ ### Problem 2: Model Architecture Differences
123
+ **Issue**: The code structure needed to be adapted for the different model architecture.
124
+
125
+ **Solution**: Updated the prediction function to use CLIP's unified text-image processing:
126
+ - **Before**: Separate text preprocessing and feature extraction using `get_text_features()` and `get_image_features()`
127
+ - **After**: Combined processing using `processor(images=image, text=fashion_items)` and `model(**inputs)`
128
+
129
+ ### Problem 3: Windows Command Compatibility
130
+ **Issue**: The original tutorial used Unix/Linux commands (`source venv/bin/activate`) which don't work on Windows PowerShell.
131
+
132
+ **Solution**: Used Windows-compatible commands:
133
+ - **Virtual Environment Activation**: Used direct Python execution via `venv\Scripts\python.exe` instead of activating the environment
134
+ - **Package Installation**: `venv\Scripts\python.exe -m pip install -r requirements.txt`
135
+
136
+ ### Final Model Choice: OpenAI CLIP
137
+ **Selected Model**: `openai/clip-vit-base-patch32`
138
+
139
+ **Reasons for Selection**:
140
+ 1. **Stability**: Well-tested and widely used in production environments
141
+ 2. **Compatibility**: Full compatibility with current PyTorch and transformers versions
142
+ 3. **Performance**: Excellent performance on image-text classification tasks
143
+ 4. **Documentation**: Extensive documentation and community support
144
+ 5. **Simplicity**: Straightforward implementation without custom code requirements
145
+
146
+ **Trade-offs**:
147
+ - **Specialization**: Less specialized for fashion items compared to the original SigLIP model
148
+ - **Accuracy**: May have slightly lower accuracy on fashion-specific classifications
149
+ - **Model Size**: Standard CLIP model size vs. potentially optimized SigLIP
150
+
151
+ The final implementation successfully classifies fashion items into categories: 'top', 'trousers', and 'bottom' using image URLs.
app.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from transformers import CLIPProcessor, CLIPModel
3
+ import torch
4
+ import requests
5
+ from PIL import Image
6
+ from io import BytesIO
7
+
8
+ fashion_items = ['top', 'trousers', 'bottom']
9
+
10
+ # Load model and processor - using standard CLIP model instead
11
+ model_name = "openai/clip-vit-base-patch32"
12
+ model = CLIPModel.from_pretrained(model_name)
13
+ processor = CLIPProcessor.from_pretrained(model_name)
14
+
15
+ # CLIP processes text and images together, so no need for separate text preprocessing
16
+
17
+ # Prediction function
18
+ def predict_from_url(url):
19
+ # Check if the URL is empty
20
+ if not url:
21
+ return {"Error": "Please input a URL"}
22
+
23
+ try:
24
+ image = Image.open(BytesIO(requests.get(url).content))
25
+ except Exception as e:
26
+ return {"Error": f"Failed to load image: {str(e)}"}
27
+
28
+ inputs = processor(images=image, text=fashion_items, return_tensors="pt", padding=True)
29
+
30
+ with torch.no_grad():
31
+ outputs = model(**inputs)
32
+ logits_per_image = outputs.logits_per_image
33
+ text_probs = logits_per_image.softmax(dim=-1)
34
+
35
+ return {fashion_items[i]: float(text_probs[0, i]) for i in range(len(fashion_items))}
36
+
37
+ # Gradio interface
38
+ demo = gr.Interface(
39
+ fn=predict_from_url,
40
+ inputs=gr.Textbox(label="Enter Image URL"),
41
+ outputs=gr.Label(label="Classification Results"),
42
+ title="Fashion Item Classifier",
43
+ allow_flagging="never"
44
+ )
45
+
46
+ # Launch the interface
47
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ transformers
2
+ torch
3
+ requests
4
+ Pillow
5
+ open_clip_torch
6
+ ftfy
7
+
8
+ # This is only needed for local deployment
9
+ gradio