MINDI 1.5 Vision-Coder

Built by MINDIGENOUS.AI β€” Faaz, Mumbai, India

Model Description

MINDI 1.5 is a multimodal agentic AI coding model that generates frontend code (HTML/CSS/JS, Next.js, React, Tailwind) from text prompts and UI screenshots.

Architecture

Component Details
Base LLM Qwen/Qwen2.5-Coder-7B-Instruct (7.62B params)
Vision Encoder CLIP ViT-L/14 (frozen, 304M params)
LoRA Adapters r=64, alpha=128 (161.5M trainable params)
Fusion VisionLanguageFusion with text_gate (16.8M params)
Total 8.1B params, 182.5M trainable (2.25%)

Training

  • 3-Phase Progressive Training on AMD MI300X 192GB + Modal A100 40GB
  • Dataset: 1.45M examples, 860M tokens
  • Final loss: 0.25–0.40 range

Checkpoint Structure

checkpoints/
β”œβ”€β”€ phase3_final/           ← Best checkpoint for inference
β”‚   β”œβ”€β”€ lora/               ← LoRA adapter weights
β”‚   β”œβ”€β”€ vision/             ← Vision projection weights
β”‚   └── fusion/             ← Fusion layer weights
β”œβ”€β”€ phase3_all_step2500_final/
β”œβ”€β”€ phase3_all_step2000/
β”œβ”€β”€ phase3_all_step1500/
└── ... (earlier phases)

Usage

from src.model.mindi_model import MINDI15
import torch

model = MINDI15(
    model_name="Qwen/Qwen2.5-Coder-7B-Instruct",
    clip_model="openai/clip-vit-large-patch14",
    hidden_size=3584,
    num_visual_tokens=256,
    torch_dtype=torch.bfloat16,
)
model.load("checkpoints/phase3_final")
model.eval()

response = model.generate(
    prompt="Build a Next.js landing page",
    max_new_tokens=2048,
    temperature=0.7,
)

Special Tokens (22 total, 11 pairs)

think, code, file, critique, suggest, search, error, fix, vision, sandbox, context

Built By

Faaz β€” MINDIGENOUS.AI | Mumbai, India | April–May 2026

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Mindigenous/MINDI-1.5-Vision-Coder

Base model

Qwen/Qwen2.5-7B
Finetuned
(358)
this model

Space using Mindigenous/MINDI-1.5-Vision-Coder 1