Wardrobe Assistant - Qwen3-VL-4B Fine-tuned Model

Model Details

Model Description

This is a fine-tuned version of Qwen3-VL-4B-Instruct optimized for analyzing and classifying clothing items in images. The model has been specifically trained to provide detailed garment analysis including type, category, color, pattern, fabric, fit, occasion, season, and gender appropriateness.

  • Model Type: Vision Language Model (VLM)
  • Base Model: Qwen3-VL-4B-Instruct
  • Fine-tuning Task: Garment Classification & Analysis
  • Input: Image + Natural Language Prompt
  • Output: Structured JSON with garment attributes
  • Architecture: Transformer-based Vision Language Model

Model Size

  • Parameters: ~4 billion
  • Precision: Auto (fp16/int8 optimized)
  • Device: GPU recommended (CUDA) or CPU

Intended Use

Primary Use Cases

  • Fashion E-commerce: Automated product listing and categorization
  • Virtual Wardrobe Management: Organizing and analyzing personal clothing collections
  • Fashion Recommendation Systems: Enabling wardrobe composition suggestions
  • Style Analysis Applications: Providing detailed insights about clothing items
  • Wardrobe Assistant Apps: Interactive applications for fashion-related queries

Direct Use

This model can be used directly to analyze images of clothing items and extract structured information about their characteristics.

Downstream Applications

  • Integration into fashion platforms and e-commerce websites
  • Mobile wardrobe management applications
  • Style recommendation engines
  • Virtual try-on technology
  • Fashion AI assistants

How to Use

Installation

pip install transformers torch torch-vision pillow gradio

Basic Usage

from transformers import Qwen3VLForConditionalGeneration, Qwen3VLProcessor
from PIL import Image
import torch

# Load model and processor
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "aman4014/Wardrobe-Initial-Classification-Model",
    torch_dtype="auto",
    device_map="auto"
).eval()

processor = Qwen3VLProcessor.from_pretrained(
    "aman4014/Wardrobe-Initial-Classification-Model"
)

# Load image
image = Image.open("garment.jpg")

# Create prompt
prompt = """You are a fashion expert analyzing a garment image.
Analyze the clothing and return a JSON object with:
type, category, color, pattern, fabric, fit, occasion, season, gender"""

# Prepare inputs
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": prompt}
    ]
}]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to("cuda")

# Generate output
with torch.inference_mode():
    generated_ids = model.generate(**inputs, max_new_tokens=512)

generated_ids_trimmed = [
    out_ids[len(in_ids):]
    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)[0]

print(output)

Using with Gradio

The model can be deployed with Gradio for an interactive web interface. See the included app.py for a complete example implementation.

Output Format

The model is designed to output structured JSON with the following fields:

{
    "type": "e.g., T-Shirt / Jeans / Dress / Jacket / Hoodie / Shorts / Saree / Kurta",
    "category": "Topwear / Bottomwear / Footwear / Outerwear / Ethnic / Accessories",
    "color": "Specific color names (e.g., Navy Blue, Olive Green)",
    "pattern": "Solid / Striped / Checkered / Floral / Printed / Graphic / Embroidered / Tie-Dye",
    "fabric": "Cotton / Denim / Wool / Polyester / Silk / Linen / Leather / Unknown",
    "fit": "Slim / Regular / Oversized / Fitted / Relaxed / Unknown",
    "occasion": "Casual / Formal / Sports / Party / Work / Ethnic",
    "season": "Summer / Winter / Monsoon / All-Season",
    "gender": "Men / Women / Unisex / Boys / Girls"
}

Training & Fine-tuning

Training Data

  • Fine-tuned on curated dataset of clothing images with detailed annotations
  • Covers diverse garment types, colors, patterns, fabrics, and styles
  • Includes global fashion categories (Western, South Asian, etc.)
  • Balanced representation across gender categories

Training Procedure

  • Base Model: Qwen3-VL-4B-Instruct (instruction-following variant)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) or full fine-tuning
  • Training Framework: Hugging Face Transformers
  • Optimization: Mixed precision training (fp16)
  • Hardware: GPU (NVIDIA CUDA recommended)

Input Specifications

  • Image Size: Optimized for 512x512 resolution
  • Supported Formats: JPEG, PNG, WebP, etc.
  • Color Space: RGB

Limitations & Bias

Known Limitations

  1. Image Quality: Performance may degrade with very low-resolution or heavily obscured images
  2. Garment Visibility: Requires clear view of the garment; full-body shots may have reduced accuracy
  3. Ambiguous Cases: Colors and patterns with high ambiguity may be classified as "Unknown"
  4. Rare Garment Types: Performance may vary on uncommon or culturally-specific clothing items
  5. Partial Visibility: Garments that are only partially visible may produce incomplete or "Unknown" attributes

Potential Biases

  • The model's predictions may reflect biases present in the training data
  • Color classification is subjective and culturally influenced
  • Gender classification relies on traditional clothing associations which may not be accurate
  • The model may have varying performance across different skin tones and body types due to training data composition

Recommendation

  • Verify outputs in critical applications
  • Use as a support tool rather than sole decision-maker
  • Implement human review for important use cases

Ethical Considerations

  • Privacy: Do not use this model to identify individuals from clothing in images
  • Fairness: Be aware of potential biases in gender and occasion classifications
  • Consent: Ensure you have appropriate permissions to process images
  • Intended Use: Use responsibly for fashion analysis and wardrobe management

Performance

Benchmark Results

  • Achieves high accuracy on standard garment classification benchmarks
  • Provides consistent JSON output structure
  • Fast inference on GPU (typically <2 seconds per image)
  • CPU inference supported with increased latency

Hardware Requirements

  • Recommended: NVIDIA GPU with 6GB+ VRAM (RTX 3060 Ti or better)
  • Minimum: GPU with 4GB VRAM or 16GB+ system RAM (CPU only)
  • Tested On: CUDA 11.8+, PyTorch 2.0+

Inference Examples

Example 1: Blue Cotton T-Shirt

Input: Image of a plain blue cotton t-shirt

{
    "type": "T-Shirt",
    "category": "Topwear",
    "color": "Royal Blue",
    "pattern": "Solid",
    "fabric": "Cotton",
    "fit": "Regular",
    "occasion": "Casual",
    "season": "All-Season",
    "gender": "Unisex"
}

Example 2: Denim Jeans

Input: Image of blue denim jeans

{
    "type": "Jeans",
    "category": "Bottomwear",
    "color": "Dark Indigo",
    "pattern": "Solid",
    "fabric": "Denim",
    "fit": "Slim",
    "occasion": "Casual",
    "season": "All-Season",
    "gender": "Men"
}

Citation

If you use this model in your research or application, please cite:

@misc{wardrobe_assistant_qwen3vl,
  author = {aman4014},
  title = {Wardrobe Assistant - Qwen3-VL-4B Fine-tuned Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/aman4014/Wardrobe-Initial-Classification-Model}}
}

Licensing

This model is based on Qwen3-VL-4B-Instruct. Please refer to the Qwen3 License for the base model's licensing terms.

Contributors

  • Model Creator: aman4014
  • Base Model: Alibaba Qwen Team
  • Framework: Hugging Face Transformers

Contact & Support

For issues, questions, or feedback regarding this model, please:

  • Open an issue on the model's Hugging Face repository
  • Contact the model creator directly

Changelog

Version 1.0 (Initial Release)

  • Released fine-tuned Qwen3-VL-4B for wardrobe analysis
  • Supports 9 key garment attributes
  • Gradio web interface included
  • JSON output format standardized

Last Updated: March 2026

Model Hub: https://huggingface.co/aman4014/Wardrobe-Initial-Classification-Model

Downloads last month
444
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aman4014/Wardrobe-Initial-Classification-Model

Finetuned
(227)
this model

Space using aman4014/Wardrobe-Initial-Classification-Model 1