| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - code-generation |
| - nextjs |
| - react |
| - typescript |
| - vision |
| - multimodal |
| - mindi |
| - mindigenous |
| base_model: Qwen/Qwen2.5-Coder-7B-Instruct |
| --- |
| |
| # MINDI 1.5 Vision-Coder |
|
|
| **Built by MINDIGENOUS.AI β Faaz, Mumbai, India** |
|
|
| ## Model Description |
| MINDI 1.5 is a multimodal agentic AI coding model that generates frontend code |
| (HTML/CSS/JS, Next.js, React, Tailwind) from text prompts and UI screenshots. |
|
|
| ## Architecture |
| | Component | Details | |
| |-----------|---------| |
| | Base LLM | Qwen/Qwen2.5-Coder-7B-Instruct (7.62B params) | |
| | Vision Encoder | CLIP ViT-L/14 (frozen, 304M params) | |
| | LoRA Adapters | r=64, alpha=128 (161.5M trainable params) | |
| | Fusion | VisionLanguageFusion with text_gate (16.8M params) | |
| | Total | 8.1B params, 182.5M trainable (2.25%) | |
| |
| ## Training |
| - **3-Phase Progressive Training** on AMD MI300X 192GB + Modal A100 40GB |
| - **Dataset**: 1.45M examples, 860M tokens |
| - **Final loss**: 0.25β0.40 range |
| |
| ## Checkpoint Structure |
| ``` |
| checkpoints/ |
| βββ phase3_final/ β Best checkpoint for inference |
| β βββ lora/ β LoRA adapter weights |
| β βββ vision/ β Vision projection weights |
| β βββ fusion/ β Fusion layer weights |
| βββ phase3_all_step2500_final/ |
| βββ phase3_all_step2000/ |
| βββ phase3_all_step1500/ |
| βββ ... (earlier phases) |
| ``` |
| |
| ## Usage |
| ```python |
| from src.model.mindi_model import MINDI15 |
| import torch |
|
|
| model = MINDI15( |
| model_name="Qwen/Qwen2.5-Coder-7B-Instruct", |
| clip_model="openai/clip-vit-large-patch14", |
| hidden_size=3584, |
| num_visual_tokens=256, |
| torch_dtype=torch.bfloat16, |
| ) |
| model.load("checkpoints/phase3_final") |
| model.eval() |
| |
| response = model.generate( |
| prompt="Build a Next.js landing page", |
| max_new_tokens=2048, |
| temperature=0.7, |
| ) |
| ``` |
| |
| ## Special Tokens (22 total, 11 pairs) |
| think, code, file, critique, suggest, search, error, fix, vision, sandbox, context |
|
|
| ## Built By |
| Faaz β MINDIGENOUS.AI | Mumbai, India | AprilβMay 2026 |
|
|