# ColQwen3 Architecture

**Created by M&K (c)2025 The LibraxisAI Team**

## Model Origins

ColQwen3 8B is based on the ColBERT late interaction paradigm, adapted for visual document retrieval using Qwen3-VL as the backbone.

### Base Models Merged

1. **tomoro-ai/Colqwen3-8B-base** - Foundation visual-language model
2. **Custom projection layers** - Trained for document embedding
3. **Visual processor** - Qwen3-VL image understanding

## Late Interaction (MaxSim)

Unlike dense retrievers that produce single vectors, ColBERT-style models produce **token-level embeddings**:

```
Query: "financial report"
        ↓
[emb_financial, emb_report]  # N query tokens

Document Page:
        ↓
[emb_Q3, emb_revenue, emb_chart, ...]  # M document tokens

MaxSim Score = Σ max(sim(q_i, d_j)) for all j
             = sum of best matches for each query token
```

This enables:
- **Fine-grained matching** - individual terms matter
- **Passage-level relevance** - not just document-level
- **Interpretable scores** - which terms matched

## Projection Layers

Raw embeddings from Qwen3-VL are 4096-dimensional. We project them down for efficiency:

| Layer | Input Dim | Output Dim | Parameters |
|-------|-----------|------------|------------|
| 128D | 4096 | 128 | 524K |
| 320D | 4096 | 320 | 1.3M |

### When to Use Each

- **128D**: Real-time search, memory-constrained
- **320D**: Batch indexing, quality-critical applications

## Image Processing Pipeline

```
PDF Page / Image
      │
      ▼
┌─────────────────────────────┐
│ Resize to 1024×1024 max     │
│ (preserve aspect ratio)     │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│ Qwen3-VL Vision Encoder     │
│ Patch embedding + attention │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│ <|image_pad|> token expand  │
│ → Token-level embeddings    │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│ Projection Layer            │
│ 4096D → 128D/320D           │
└──────────────┬──────────────┘
               │
         Document Embedding
         [num_patches × dim]
```

## Query Processing

Text queries go through the language model only:

```
Query Text
      │
      ▼
┌─────────────────────────────┐
│ Tokenizer                   │
│ → Token IDs                 │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│ Qwen3-VL Text Encoder       │
│ → Hidden states             │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│ Projection Layer            │
│ 4096D → 128D/320D           │
└──────────────┬──────────────┘
               │
         Query Embedding
         [num_tokens × dim]
```

## Memory Layout

On Apple Silicon (MLX):

```
┌─────────────────────────────────────┐
│ Unified Memory                       │
├─────────────────────────────────────┤
│ Model weights        ~17GB          │
│ KV Cache            ~1-2GB          │
│ Projection layers   ~5MB            │
│ Working memory      ~1GB            │
├─────────────────────────────────────┤
│ Total              ~18-20GB         │
└─────────────────────────────────────┘
```

## Indexing Strategy

For production deployment:

1. **Pre-compute document embeddings** (offline)
2. **Store in vector database** (LanceDB, Qdrant, etc.)
3. **Online query embedding** (real-time)
4. **MaxSim scoring** (can be batched)

```python
# Indexing (offline)
for page in pdf_pages:
    embedding = embedder.embed_image(page)
    vector_db.insert(doc_id, page_num, embedding)

# Search (online)
query_emb = embedder.embed_query(query_text)
candidates = vector_db.search(query_emb, k=100)
scores = [embedder.maxsim(query_emb, doc_emb) for doc_emb in candidates]
```

## File Format

Model weights use MLX-compatible safetensors:

```
model-00001-of-00007.safetensors  # 5.0GB
model-00002-of-00007.safetensors  # 4.9GB
model-00003-of-00007.safetensors  # 4.8GB
model-00004-of-00007.safetensors  # 4.8GB
model-00005-of-00007.safetensors  # 5.0GB
model-00006-of-00007.safetensors  # 5.0GB
model-00007-of-00007.safetensors  # 3.2GB
                                  --------
                           Total: ~35GB
```

Projection layers are separate safetensors files for flexibility.

---

**Co-Authored-By: [Maciej](void@div0.space) & [Klaudiusz](the1st@whoai.am)**