Vet-Rate Vision Phi (q4f32_1 WebGPU)

A WebGPU-optimized compilation of Microsoft's Phi-3.5 Vision model for browser-based inference.

Model Description

This is a quantized (q4f32_1) version of microsoft/Phi-3.5-vision-instruct compiled for WebGPU using MLC-LLM. It's designed for use with WebLLM in standard browsers without requiring experimental Chrome flags.

Key Features

  • ๐Ÿš€ Browser-native: Runs entirely in-browser via WebGPU
  • ๐Ÿ“ท Vision capable: Supports image understanding and analysis
  • โšก Optimized: q4f32_1 quantization for efficient memory usage
  • ๐Ÿ”’ Privacy-first: All processing happens locally on your device

Technical Specifications

Property Value
Base Model microsoft/Phi-3.5-vision-instruct
Quantization q4f32_1 (int4, float32 model dtype)
Model Size ~2.6 GB (quantized weights)
WASM Library 6.6 MB
Context Window 131,072 tokens
Parameters ~4B
Vision Encoder CLIP ViT-L/14 (336px)

Usage with WebLLM

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("Vet-Rate-org/Vet-Rate-Vision-Phi");

// Text-only chat
const response = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Hello!" }]
});

// Vision chat (with image)
const response = await engine.chat.completions.create({
  messages: [{
    role: "user",
    content: [
      { type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } },
      { type: "text", text: "What do you see in this image?" }
    ]
  }]
});
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Vet-Rate-org/Vet-Rate-Vision-Phi

Finetuned
(21)
this model