File size: 2,041 Bytes
d97d813 553fbf7 d97d813 553fbf7 d97d813 4386567 d97d813 4386567 d97d813 553fbf7 d97d813 553fbf7 d97d813 4386567 553fbf7 d97d813 553fbf7 d97d813 553fbf7 d97d813 553fbf7 d97d813 553fbf7 d97d813 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | ---
license: apache-2.0
language:
- en
tags:
- code-generation
- nextjs
- react
- typescript
- vision
- multimodal
- mindi
- mindigenous
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
---
# MINDI 1.5 Vision-Coder
**Built by MINDIGENOUS.AI β Faaz, Mumbai, India**
## Model Description
MINDI 1.5 is a multimodal agentic AI coding model that generates frontend code
(HTML/CSS/JS, Next.js, React, Tailwind) from text prompts and UI screenshots.
## Architecture
| Component | Details |
|-----------|---------|
| Base LLM | Qwen/Qwen2.5-Coder-7B-Instruct (7.62B params) |
| Vision Encoder | CLIP ViT-L/14 (frozen, 304M params) |
| LoRA Adapters | r=64, alpha=128 (161.5M trainable params) |
| Fusion | VisionLanguageFusion with text_gate (16.8M params) |
| Total | 8.1B params, 182.5M trainable (2.25%) |
## Training
- **3-Phase Progressive Training** on AMD MI300X 192GB + Modal A100 40GB
- **Dataset**: 1.45M examples, 860M tokens
- **Final loss**: 0.25β0.40 range
## Checkpoint Structure
```
checkpoints/
βββ phase3_final/ β Best checkpoint for inference
β βββ lora/ β LoRA adapter weights
β βββ vision/ β Vision projection weights
β βββ fusion/ β Fusion layer weights
βββ phase3_all_step2500_final/
βββ phase3_all_step2000/
βββ phase3_all_step1500/
βββ ... (earlier phases)
```
## Usage
```python
from src.model.mindi_model import MINDI15
import torch
model = MINDI15(
model_name="Qwen/Qwen2.5-Coder-7B-Instruct",
clip_model="openai/clip-vit-large-patch14",
hidden_size=3584,
num_visual_tokens=256,
torch_dtype=torch.bfloat16,
)
model.load("checkpoints/phase3_final")
model.eval()
response = model.generate(
prompt="Build a Next.js landing page",
max_new_tokens=2048,
temperature=0.7,
)
```
## Special Tokens (22 total, 11 pairs)
think, code, file, critique, suggest, search, error, fix, vision, sandbox, context
## Built By
Faaz β MINDIGENOUS.AI | Mumbai, India | AprilβMay 2026
|