Mindigenous
/

MINDI-1.5-Vision-Coder

code-generation

Model card Files Files and versions

MINDI-1.5-Vision-Coder / README.md

Mindigenous's picture

Upload README.md with huggingface_hub

d97d813 verified 3 days ago

|

history blame contribute delete

2.04 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- code-generation
	- nextjs
	- react
	- typescript
	- vision
	- multimodal
	- mindi
	- mindigenous
	base_model: Qwen/Qwen2.5-Coder-7B-Instruct
	---

	# MINDI 1.5 Vision-Coder

	Built by MINDIGENOUS.AI — Faaz, Mumbai, India

	## Model Description
	MINDI 1.5 is a multimodal agentic AI coding model that generates frontend code
	(HTML/CSS/JS, Next.js, React, Tailwind) from text prompts and UI screenshots.

	## Architecture
	\| Component \| Details \|
	\|-----------\|---------\|
	\| Base LLM \| Qwen/Qwen2.5-Coder-7B-Instruct (7.62B params) \|
	\| Vision Encoder \| CLIP ViT-L/14 (frozen, 304M params) \|
	\| LoRA Adapters \| r=64, alpha=128 (161.5M trainable params) \|
	\| Fusion \| VisionLanguageFusion with text_gate (16.8M params) \|
	\| Total \| 8.1B params, 182.5M trainable (2.25%) \|

	## Training
	- 3-Phase Progressive Training on AMD MI300X 192GB + Modal A100 40GB
	- Dataset: 1.45M examples, 860M tokens
	- Final loss: 0.25–0.40 range

	## Checkpoint Structure
	```
	checkpoints/
	├── phase3_final/ ← Best checkpoint for inference
	│ ├── lora/ ← LoRA adapter weights
	│ ├── vision/ ← Vision projection weights
	│ └── fusion/ ← Fusion layer weights
	├── phase3_all_step2500_final/
	├── phase3_all_step2000/
	├── phase3_all_step1500/
	└── ... (earlier phases)
	```

	## Usage
	```python
	from src.model.mindi_model import MINDI15
	import torch

	model = MINDI15(
	model_name="Qwen/Qwen2.5-Coder-7B-Instruct",
	clip_model="openai/clip-vit-large-patch14",
	hidden_size=3584,
	num_visual_tokens=256,
	torch_dtype=torch.bfloat16,
	)
	model.load("checkpoints/phase3_final")
	model.eval()

	response = model.generate(
	prompt="Build a Next.js landing page",
	max_new_tokens=2048,
	temperature=0.7,
	)
	```

	## Special Tokens (22 total, 11 pairs)
	think, code, file, critique, suggest, search, error, fix, vision, sandbox, context

	## Built By
	Faaz — MINDIGENOUS.AI \| Mumbai, India \| April–May 2026