Qwen3.5 Grocery Multi-task Model

A pruned Qwen3.5-0.8B vision-language model (12 text layers, down from 24) fine-tuned for grocery product detection and classification.

Model Details

  • Base model: Qwen/Qwen3.5-0.8B (pruned to 12 text layers)
  • Tasks: Product classification (356 classes) + grid-based object detection
  • Parameters: 860.7M total (backbone: 858.2M, cls head: 1.4M, det head: 1.0M)
  • Training step: 7500
  • Validation accuracy (classification): 0.801
  • Validation loss: 0.6929

Architecture

  • Backbone: Qwen3.5 vision encoder (12 ViT layers, 768 hidden) + merger + 12 text transformer blocks (1024 hidden)
  • Classification head: Linear(1024, 1024) โ†’ GELU โ†’ Dropout โ†’ Linear(1024, 356) on mean-pooled features
  • Detection head: Anchor-free 14x14 grid, predicts [conf, x_off, y_off, w, h] per cell

Files

  • model.safetensors - Backbone weights (vision encoder + text layers)
  • cls_head.safetensors - Classification head weights
  • det_head.safetensors - Detection head weights
  • config.json - Model config (pruned 12-layer Qwen3.5)
  • tokenizer.json / tokenizer_config.json - Tokenizer files

Usage

import torch
from safetensors.torch import load_file
from transformers import AutoModelForImageTextToText, AutoProcessor

# Load backbone
model = AutoModelForImageTextToText.from_pretrained(
    "heiertech/qwen35-grocery-multitask",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

# Load classification head
cls_state = load_file("cls_head.safetensors")
# ... attach to your ClassificationHead module

# Load detection head
det_state = load_file("det_head.safetensors")
# ... attach to your DetectionHead module
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support