--- license: apache-2.0 language: - en tags: - code-generation - nextjs - react - typescript - vision - multimodal - mindi - mindigenous base_model: Qwen/Qwen2.5-Coder-7B-Instruct --- # MINDI 1.5 Vision-Coder **Built by MINDIGENOUS.AI — Faaz, Mumbai, India** ## Model Description MINDI 1.5 is a multimodal agentic AI coding model that generates frontend code (HTML/CSS/JS, Next.js, React, Tailwind) from text prompts and UI screenshots. ## Architecture | Component | Details | |-----------|---------| | Base LLM | Qwen/Qwen2.5-Coder-7B-Instruct (7.62B params) | | Vision Encoder | CLIP ViT-L/14 (frozen, 304M params) | | LoRA Adapters | r=64, alpha=128 (161.5M trainable params) | | Fusion | VisionLanguageFusion with text_gate (16.8M params) | | Total | 8.1B params, 182.5M trainable (2.25%) | ## Training - **3-Phase Progressive Training** on AMD MI300X 192GB + Modal A100 40GB - **Dataset**: 1.45M examples, 860M tokens - **Final loss**: 0.25–0.40 range ## Checkpoint Structure ``` checkpoints/ ├── phase3_final/ ← Best checkpoint for inference │ ├── lora/ ← LoRA adapter weights │ ├── vision/ ← Vision projection weights │ └── fusion/ ← Fusion layer weights ├── phase3_all_step2500_final/ ├── phase3_all_step2000/ ├── phase3_all_step1500/ └── ... (earlier phases) ``` ## Usage ```python from src.model.mindi_model import MINDI15 import torch model = MINDI15( model_name="Qwen/Qwen2.5-Coder-7B-Instruct", clip_model="openai/clip-vit-large-patch14", hidden_size=3584, num_visual_tokens=256, torch_dtype=torch.bfloat16, ) model.load("checkpoints/phase3_final") model.eval() response = model.generate( prompt="Build a Next.js landing page", max_new_tokens=2048, temperature=0.7, ) ``` ## Special Tokens (22 total, 11 pairs) think, code, file, critique, suggest, search, error, fix, vision, sandbox, context ## Built By Faaz — MINDIGENOUS.AI | Mumbai, India | April–May 2026