Wardrobe Assistant - Qwen3-VL-4B Fine-tuned Model

Model Details

Model Description

This is a fine-tuned version of Qwen3-VL-4B-Instruct optimized for analyzing and classifying clothing items in images. The model has been specifically trained to provide detailed garment analysis including type, category, color, pattern, fabric, fit, occasion, season, and gender appropriateness.

Model Type: Vision Language Model (VLM)
Base Model: Qwen3-VL-4B-Instruct
Fine-tuning Task: Garment Classification & Analysis
Input: Image + Natural Language Prompt
Output: Structured JSON with garment attributes
Architecture: Transformer-based Vision Language Model

Model Size

Parameters: ~4 billion
Precision: Auto (fp16/int8 optimized)
Device: GPU recommended (CUDA) or CPU

Intended Use

Primary Use Cases

Fashion E-commerce: Automated product listing and categorization
Virtual Wardrobe Management: Organizing and analyzing personal clothing collections
Fashion Recommendation Systems: Enabling wardrobe composition suggestions
Style Analysis Applications: Providing detailed insights about clothing items
Wardrobe Assistant Apps: Interactive applications for fashion-related queries

Direct Use

This model can be used directly to analyze images of clothing items and extract structured information about their characteristics.

Downstream Applications

Integration into fashion platforms and e-commerce websites
Mobile wardrobe management applications
Style recommendation engines
Virtual try-on technology
Fashion AI assistants

How to Use

Installation

pip install transformers torch torch-vision pillow gradio

Basic Usage

from transformers import Qwen3VLForConditionalGeneration, Qwen3VLProcessor
from PIL import Image
import torch

# Load model and processor
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "aman4014/Wardrobe-Initial-Classification-Model",
    torch_dtype="auto",
    device_map="auto"
).eval()

processor = Qwen3VLProcessor.from_pretrained(
    "aman4014/Wardrobe-Initial-Classification-Model"
)

# Load image
image = Image.open("garment.jpg")

# Create prompt
prompt = """You are a fashion expert analyzing a garment image.
Analyze the clothing and return a JSON object with:
type, category, color, pattern, fabric, fit, occasion, season, gender"""

# Prepare inputs
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": prompt}
    ]
}]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to("cuda")

# Generate output
with torch.inference_mode():
    generated_ids = model.generate(**inputs, max_new_tokens=512)

generated_ids_trimmed = [
    out_ids[len(in_ids):]
    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)[0]

print(output)

Using with Gradio

The model can be deployed with Gradio for an interactive web interface. See the included app.py for a complete example implementation.

Output Format

The model is designed to output structured JSON with the following fields:

{
    "type": "e.g., T-Shirt / Jeans / Dress / Jacket / Hoodie / Shorts / Saree / Kurta",
    "category": "Topwear / Bottomwear / Footwear / Outerwear / Ethnic / Accessories",
    "color": "Specific color names (e.g., Navy Blue, Olive Green)",
    "pattern": "Solid / Striped / Checkered / Floral / Printed / Graphic / Embroidered / Tie-Dye",
    "fabric": "Cotton / Denim / Wool / Polyester / Silk / Linen / Leather / Unknown",
    "fit": "Slim / Regular / Oversized / Fitted / Relaxed / Unknown",
    "occasion": "Casual / Formal / Sports / Party / Work / Ethnic",
    "season": "Summer / Winter / Monsoon / All-Season",
    "gender": "Men / Women / Unisex / Boys / Girls"
}

Training & Fine-tuning

Training Data

Fine-tuned on curated dataset of clothing images with detailed annotations
Covers diverse garment types, colors, patterns, fabrics, and styles
Includes global fashion categories (Western, South Asian, etc.)
Balanced representation across gender categories

Training Procedure

Base Model: Qwen3-VL-4B-Instruct (instruction-following variant)
Fine-tuning Method: LoRA (Low-Rank Adaptation) or full fine-tuning
Training Framework: Hugging Face Transformers
Optimization: Mixed precision training (fp16)
Hardware: GPU (NVIDIA CUDA recommended)

Input Specifications

Image Size: Optimized for 512x512 resolution
Supported Formats: JPEG, PNG, WebP, etc.
Color Space: RGB

Limitations & Bias

Known Limitations

Image Quality: Performance may degrade with very low-resolution or heavily obscured images
Garment Visibility: Requires clear view of the garment; full-body shots may have reduced accuracy
Ambiguous Cases: Colors and patterns with high ambiguity may be classified as "Unknown"
Rare Garment Types: Performance may vary on uncommon or culturally-specific clothing items
Partial Visibility: Garments that are only partially visible may produce incomplete or "Unknown" attributes

Potential Biases

The model's predictions may reflect biases present in the training data
Color classification is subjective and culturally influenced
Gender classification relies on traditional clothing associations which may not be accurate
The model may have varying performance across different skin tones and body types due to training data composition

Recommendation

Verify outputs in critical applications
Use as a support tool rather than sole decision-maker
Implement human review for important use cases

Ethical Considerations

Privacy: Do not use this model to identify individuals from clothing in images
Fairness: Be aware of potential biases in gender and occasion classifications
Consent: Ensure you have appropriate permissions to process images
Intended Use: Use responsibly for fashion analysis and wardrobe management

Performance

Benchmark Results

Achieves high accuracy on standard garment classification benchmarks
Provides consistent JSON output structure
Fast inference on GPU (typically <2 seconds per image)
CPU inference supported with increased latency

Hardware Requirements

Recommended: NVIDIA GPU with 6GB+ VRAM (RTX 3060 Ti or better)
Minimum: GPU with 4GB VRAM or 16GB+ system RAM (CPU only)
Tested On: CUDA 11.8+, PyTorch 2.0+

Inference Examples

Example 1: Blue Cotton T-Shirt

Input: Image of a plain blue cotton t-shirt

{
    "type": "T-Shirt",
    "category": "Topwear",
    "color": "Royal Blue",
    "pattern": "Solid",
    "fabric": "Cotton",
    "fit": "Regular",
    "occasion": "Casual",
    "season": "All-Season",
    "gender": "Unisex"
}

Example 2: Denim Jeans

Input: Image of blue denim jeans

{
    "type": "Jeans",
    "category": "Bottomwear",
    "color": "Dark Indigo",
    "pattern": "Solid",
    "fabric": "Denim",
    "fit": "Slim",
    "occasion": "Casual",
    "season": "All-Season",
    "gender": "Men"
}

Citation

If you use this model in your research or application, please cite:

@misc{wardrobe_assistant_qwen3vl,
  author = {aman4014},
  title = {Wardrobe Assistant - Qwen3-VL-4B Fine-tuned Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/aman4014/Wardrobe-Initial-Classification-Model}}
}

Licensing

This model is based on Qwen3-VL-4B-Instruct. Please refer to the Qwen3 License for the base model's licensing terms.

Contributors

Model Creator: aman4014
Base Model: Alibaba Qwen Team
Framework: Hugging Face Transformers

Contact & Support

For issues, questions, or feedback regarding this model, please:

Open an issue on the model's Hugging Face repository
Contact the model creator directly

Changelog

Version 1.0 (Initial Release)

Released fine-tuned Qwen3-VL-4B for wardrobe analysis
Supports 9 key garment attributes
Gradio web interface included
JSON output format standardized

Last Updated: March 2026

Model Hub: https://huggingface.co/aman4014/Wardrobe-Initial-Classification-Model

Downloads last month: 444

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aman4014/Wardrobe-Initial-Classification-Model

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(227)

this model

aman4014
/

Wardrobe-Initial-Classification-Model