File size: 2,041 Bytes
d97d813
 
 
 
 
 
 
 
 
 
 
 
 
 
553fbf7
 
d97d813
553fbf7
d97d813
4386567
d97d813
 
 
4386567
d97d813
 
 
 
 
 
 
 
553fbf7
d97d813
 
 
 
553fbf7
d97d813
 
 
 
 
 
 
 
 
 
 
4386567
553fbf7
d97d813
 
 
 
553fbf7
d97d813
 
 
 
 
 
 
 
 
553fbf7
d97d813
 
 
 
 
 
553fbf7
d97d813
 
553fbf7
d97d813
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: apache-2.0
language:
- en
tags:
- code-generation
- nextjs
- react
- typescript
- vision
- multimodal
- mindi
- mindigenous
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
---

# MINDI 1.5 Vision-Coder

**Built by MINDIGENOUS.AI β€” Faaz, Mumbai, India**

## Model Description
MINDI 1.5 is a multimodal agentic AI coding model that generates frontend code
(HTML/CSS/JS, Next.js, React, Tailwind) from text prompts and UI screenshots.

## Architecture
| Component | Details |
|-----------|---------|
| Base LLM | Qwen/Qwen2.5-Coder-7B-Instruct (7.62B params) |
| Vision Encoder | CLIP ViT-L/14 (frozen, 304M params) |
| LoRA Adapters | r=64, alpha=128 (161.5M trainable params) |
| Fusion | VisionLanguageFusion with text_gate (16.8M params) |
| Total | 8.1B params, 182.5M trainable (2.25%) |

## Training
- **3-Phase Progressive Training** on AMD MI300X 192GB + Modal A100 40GB
- **Dataset**: 1.45M examples, 860M tokens
- **Final loss**: 0.25–0.40 range

## Checkpoint Structure
```
checkpoints/
β”œβ”€β”€ phase3_final/           ← Best checkpoint for inference
β”‚   β”œβ”€β”€ lora/               ← LoRA adapter weights
β”‚   β”œβ”€β”€ vision/             ← Vision projection weights
β”‚   └── fusion/             ← Fusion layer weights
β”œβ”€β”€ phase3_all_step2500_final/
β”œβ”€β”€ phase3_all_step2000/
β”œβ”€β”€ phase3_all_step1500/
└── ... (earlier phases)
```

## Usage
```python
from src.model.mindi_model import MINDI15
import torch

model = MINDI15(
    model_name="Qwen/Qwen2.5-Coder-7B-Instruct",
    clip_model="openai/clip-vit-large-patch14",
    hidden_size=3584,
    num_visual_tokens=256,
    torch_dtype=torch.bfloat16,
)
model.load("checkpoints/phase3_final")
model.eval()

response = model.generate(
    prompt="Build a Next.js landing page",
    max_new_tokens=2048,
    temperature=0.7,
)
```

## Special Tokens (22 total, 11 pairs)
think, code, file, critique, suggest, search, error, fix, vision, sandbox, context

## Built By
Faaz β€” MINDIGENOUS.AI | Mumbai, India | April–May 2026