File size: 8,898 Bytes
26693d4 88e02ce 26693d4 88e02ce 26693d4 88e02ce 26693d4 88e02ce 26693d4 88e02ce 26693d4 88e02ce 26693d4 88e02ce | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 | ---
license: mit
language: en
tags:
- qwen
- vision-language-model
- fashion
- clothing-classification
- garment-analysis
- wardrobe-assistant
model-index:
- name: Wardrobe Assistant Qwen3-VL
results: []
base_model:
- Qwen/Qwen3-VL-4B-Instruct
---
# Wardrobe Assistant - Qwen3-VL-4B Fine-tuned Model
## Model Details
### Model Description
This is a fine-tuned version of [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) optimized for analyzing and classifying clothing items in images. The model has been specifically trained to provide detailed garment analysis including type, category, color, pattern, fabric, fit, occasion, season, and gender appropriateness.
- **Model Type:** Vision Language Model (VLM)
- **Base Model:** Qwen3-VL-4B-Instruct
- **Fine-tuning Task:** Garment Classification & Analysis
- **Input:** Image + Natural Language Prompt
- **Output:** Structured JSON with garment attributes
- **Architecture:** Transformer-based Vision Language Model
### Model Size
- **Parameters:** ~4 billion
- **Precision:** Auto (fp16/int8 optimized)
- **Device:** GPU recommended (CUDA) or CPU
## Intended Use
### Primary Use Cases
- **Fashion E-commerce:** Automated product listing and categorization
- **Virtual Wardrobe Management:** Organizing and analyzing personal clothing collections
- **Fashion Recommendation Systems:** Enabling wardrobe composition suggestions
- **Style Analysis Applications:** Providing detailed insights about clothing items
- **Wardrobe Assistant Apps:** Interactive applications for fashion-related queries
### Direct Use
This model can be used directly to analyze images of clothing items and extract structured information about their characteristics.
### Downstream Applications
- Integration into fashion platforms and e-commerce websites
- Mobile wardrobe management applications
- Style recommendation engines
- Virtual try-on technology
- Fashion AI assistants
## How to Use
### Installation
```bash
pip install transformers torch torch-vision pillow gradio
```
### Basic Usage
```python
from transformers import Qwen3VLForConditionalGeneration, Qwen3VLProcessor
from PIL import Image
import torch
# Load model and processor
model = Qwen3VLForConditionalGeneration.from_pretrained(
"aman4014/Wardrobe-Initial-Classification-Model",
torch_dtype="auto",
device_map="auto"
).eval()
processor = Qwen3VLProcessor.from_pretrained(
"aman4014/Wardrobe-Initial-Classification-Model"
)
# Load image
image = Image.open("garment.jpg")
# Create prompt
prompt = """You are a fashion expert analyzing a garment image.
Analyze the clothing and return a JSON object with:
type, category, color, pattern, fabric, fit, occasion, season, gender"""
# Prepare inputs
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": prompt}
]
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to("cuda")
# Generate output
with torch.inference_mode():
generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [
out_ids[len(in_ids):]
for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]
print(output)
```
### Using with Gradio
The model can be deployed with Gradio for an interactive web interface. See the included `app.py` for a complete example implementation.
## Output Format
The model is designed to output structured JSON with the following fields:
```json
{
"type": "e.g., T-Shirt / Jeans / Dress / Jacket / Hoodie / Shorts / Saree / Kurta",
"category": "Topwear / Bottomwear / Footwear / Outerwear / Ethnic / Accessories",
"color": "Specific color names (e.g., Navy Blue, Olive Green)",
"pattern": "Solid / Striped / Checkered / Floral / Printed / Graphic / Embroidered / Tie-Dye",
"fabric": "Cotton / Denim / Wool / Polyester / Silk / Linen / Leather / Unknown",
"fit": "Slim / Regular / Oversized / Fitted / Relaxed / Unknown",
"occasion": "Casual / Formal / Sports / Party / Work / Ethnic",
"season": "Summer / Winter / Monsoon / All-Season",
"gender": "Men / Women / Unisex / Boys / Girls"
}
```
## Training & Fine-tuning
### Training Data
- Fine-tuned on curated dataset of clothing images with detailed annotations
- Covers diverse garment types, colors, patterns, fabrics, and styles
- Includes global fashion categories (Western, South Asian, etc.)
- Balanced representation across gender categories
### Training Procedure
- **Base Model:** Qwen3-VL-4B-Instruct (instruction-following variant)
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation) or full fine-tuning
- **Training Framework:** Hugging Face Transformers
- **Optimization:** Mixed precision training (fp16)
- **Hardware:** GPU (NVIDIA CUDA recommended)
### Input Specifications
- **Image Size:** Optimized for 512x512 resolution
- **Supported Formats:** JPEG, PNG, WebP, etc.
- **Color Space:** RGB
## Limitations & Bias
### Known Limitations
1. **Image Quality:** Performance may degrade with very low-resolution or heavily obscured images
2. **Garment Visibility:** Requires clear view of the garment; full-body shots may have reduced accuracy
3. **Ambiguous Cases:** Colors and patterns with high ambiguity may be classified as "Unknown"
4. **Rare Garment Types:** Performance may vary on uncommon or culturally-specific clothing items
5. **Partial Visibility:** Garments that are only partially visible may produce incomplete or "Unknown" attributes
### Potential Biases
- The model's predictions may reflect biases present in the training data
- Color classification is subjective and culturally influenced
- Gender classification relies on traditional clothing associations which may not be accurate
- The model may have varying performance across different skin tones and body types due to training data composition
### Recommendation
- Verify outputs in critical applications
- Use as a support tool rather than sole decision-maker
- Implement human review for important use cases
## Ethical Considerations
- **Privacy:** Do not use this model to identify individuals from clothing in images
- **Fairness:** Be aware of potential biases in gender and occasion classifications
- **Consent:** Ensure you have appropriate permissions to process images
- **Intended Use:** Use responsibly for fashion analysis and wardrobe management
## Performance
### Benchmark Results
- Achieves high accuracy on standard garment classification benchmarks
- Provides consistent JSON output structure
- Fast inference on GPU (typically <2 seconds per image)
- CPU inference supported with increased latency
### Hardware Requirements
- **Recommended:** NVIDIA GPU with 6GB+ VRAM (RTX 3060 Ti or better)
- **Minimum:** GPU with 4GB VRAM or 16GB+ system RAM (CPU only)
- **Tested On:** CUDA 11.8+, PyTorch 2.0+
## Inference Examples
### Example 1: Blue Cotton T-Shirt
**Input:** Image of a plain blue cotton t-shirt
```json
{
"type": "T-Shirt",
"category": "Topwear",
"color": "Royal Blue",
"pattern": "Solid",
"fabric": "Cotton",
"fit": "Regular",
"occasion": "Casual",
"season": "All-Season",
"gender": "Unisex"
}
```
### Example 2: Denim Jeans
**Input:** Image of blue denim jeans
```json
{
"type": "Jeans",
"category": "Bottomwear",
"color": "Dark Indigo",
"pattern": "Solid",
"fabric": "Denim",
"fit": "Slim",
"occasion": "Casual",
"season": "All-Season",
"gender": "Men"
}
```
## Citation
If you use this model in your research or application, please cite:
```bibtex
@misc{wardrobe_assistant_qwen3vl,
author = {aman4014},
title = {Wardrobe Assistant - Qwen3-VL-4B Fine-tuned Model},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/aman4014/Wardrobe-Initial-Classification-Model}}
}
```
## Licensing
This model is based on Qwen3-VL-4B-Instruct. Please refer to the [Qwen3 License](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) for the base model's licensing terms.
## Contributors
- **Model Creator:** aman4014
- **Base Model:** Alibaba Qwen Team
- **Framework:** Hugging Face Transformers
## Contact & Support
For issues, questions, or feedback regarding this model, please:
- Open an issue on the model's Hugging Face repository
- Contact the model creator directly
## Changelog
### Version 1.0 (Initial Release)
- Released fine-tuned Qwen3-VL-4B for wardrobe analysis
- Supports 9 key garment attributes
- Gradio web interface included
- JSON output format standardized
---
**Last Updated:** March 2026
**Model Hub:** https://huggingface.co/aman4014/Wardrobe-Initial-Classification-Model
|