colqwen3-8b-vetcoders-mlx / docs /ARCHITECTURE.md

Upload folder using huggingface_hub

c6c3a3b verified 15 days ago

5.75 kB

	# ColQwen3 Architecture

	Created by M&K (c)2025 The LibraxisAI Team

	## Model Origins

	ColQwen3 8B is based on the ColBERT late interaction paradigm, adapted for visual document retrieval using Qwen3-VL as the backbone.

	### Base Models Merged

	1. tomoro-ai/Colqwen3-8B-base - Foundation visual-language model
	2. Custom projection layers - Trained for document embedding
	3. Visual processor - Qwen3-VL image understanding

	## Late Interaction (MaxSim)

	Unlike dense retrievers that produce single vectors, ColBERT-style models produce token-level embeddings:

	```
	Query: "financial report"
	↓
	[emb_financial, emb_report] # N query tokens

	Document Page:
	↓
	[emb_Q3, emb_revenue, emb_chart, ...] # M document tokens

	MaxSim Score = Σ max(sim(q_i, d_j)) for all j
	= sum of best matches for each query token
	```

	This enables:
	- Fine-grained matching - individual terms matter
	- Passage-level relevance - not just document-level
	- Interpretable scores - which terms matched

	## Projection Layers

	Raw embeddings from Qwen3-VL are 4096-dimensional. We project them down for efficiency:

	\| Layer \| Input Dim \| Output Dim \| Parameters \|
	\|-------\|-----------\|------------\|------------\|
	\| 128D \| 4096 \| 128 \| 524K \|
	\| 320D \| 4096 \| 320 \| 1.3M \|

	### When to Use Each

	- 128D: Real-time search, memory-constrained
	- 320D: Batch indexing, quality-critical applications

	## Image Processing Pipeline

	```
	PDF Page / Image
	│
	▼
	┌─────────────────────────────┐
	│ Resize to 1024×1024 max │
	│ (preserve aspect ratio) │
	└──────────────┬──────────────┘
	│
	▼
	┌─────────────────────────────┐
	│ Qwen3-VL Vision Encoder │
	│ Patch embedding + attention │
	└──────────────┬──────────────┘
	│
	▼
	┌─────────────────────────────┐
	│ <\|image_pad\|> token expand │
	│ → Token-level embeddings │
	└──────────────┬──────────────┘
	│
	▼
	┌─────────────────────────────┐
	│ Projection Layer │
	│ 4096D → 128D/320D │
	└──────────────┬──────────────┘
	│
	Document Embedding
	[num_patches × dim]
	```

	## Query Processing

	Text queries go through the language model only:

	```
	Query Text
	│
	▼
	┌─────────────────────────────┐
	│ Tokenizer │
	│ → Token IDs │
	└──────────────┬──────────────┘
	│
	▼
	┌─────────────────────────────┐
	│ Qwen3-VL Text Encoder │
	│ → Hidden states │
	└──────────────┬──────────────┘
	│
	▼
	┌─────────────────────────────┐
	│ Projection Layer │
	│ 4096D → 128D/320D │
	└──────────────┬──────────────┘
	│
	Query Embedding
	[num_tokens × dim]
	```

	## Memory Layout

	On Apple Silicon (MLX):

	```
	┌─────────────────────────────────────┐
	│ Unified Memory │
	├─────────────────────────────────────┤
	│ Model weights ~17GB │
	│ KV Cache ~1-2GB │
	│ Projection layers ~5MB │
	│ Working memory ~1GB │
	├─────────────────────────────────────┤
	│ Total ~18-20GB │
	└─────────────────────────────────────┘
	```

	## Indexing Strategy

	For production deployment:

	1. Pre-compute document embeddings (offline)
	2. Store in vector database (LanceDB, Qdrant, etc.)
	3. Online query embedding (real-time)
	4. MaxSim scoring (can be batched)

	```python
	# Indexing (offline)
	for page in pdf_pages:
	embedding = embedder.embed_image(page)
	vector_db.insert(doc_id, page_num, embedding)

	# Search (online)
	query_emb = embedder.embed_query(query_text)
	candidates = vector_db.search(query_emb, k=100)
	scores = [embedder.maxsim(query_emb, doc_emb) for doc_emb in candidates]
	```

	## File Format

	Model weights use MLX-compatible safetensors:

	```
	model-00001-of-00007.safetensors # 5.0GB
	model-00002-of-00007.safetensors # 4.9GB
	model-00003-of-00007.safetensors # 4.8GB
	model-00004-of-00007.safetensors # 4.8GB
	model-00005-of-00007.safetensors # 5.0GB
	model-00006-of-00007.safetensors # 5.0GB
	model-00007-of-00007.safetensors # 3.2GB
	--------
	Total: ~35GB
	```

	Projection layers are separate safetensors files for flexibility.

	---

	Co-Authored-By: [Maciej](void@div0.space) & [Klaudiusz](the1st@whoai.am)