File size: 5,815 Bytes
1c546e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
license: apache-2.0
language:
- en
tags:
- multimodal
- vision-language
- openvino
- optimum-intel
- testing
- tiny-model
- minicpmo
base_model: openbmb/MiniCPM-o-2_6
library_name: transformers
pipeline_tag: image-text-to-text
---

# Tiny Random MiniCPM-o-2_6

A tiny (~42 MB) randomly-initialized version of [MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6) designed for **testing purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library.

## Purpose

This model was created to replace the existing test model at `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` (185 MB) with a smaller alternative for CI/CD testing. Smaller test models reduce:

- Download times in CI pipelines
- Storage requirements
- Test execution time

## Size Comparison

| Model | Total Size | Model Weights |
|-------|------------|---------------|
| [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (Original) | 17.4 GB | ~17 GB |
| [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) (Current Test Model) | 185 MB | 169 MB |
| **hrithik-dev8/tiny-random-MiniCPM-o-2_6** (This Model) | **~42 MB** | **41.55 MB** |

**Result: 4× smaller than Intel's current test model**

## Model Configuration

| Component | This Model | Original |
|-----------|------------|----------|
| **Vocabulary** | 5,000 tokens | 151,700 tokens |
| **LLM Hidden Size** | 128 | 3,584 |
| **LLM Layers** | 1 | 40 |
| **LLM Attention Heads** | 8 | 28 |
| **Vision Hidden Size** | 128 | 1,152 |
| **Vision Layers** | 1 | 27 |
| **Image Size** | 980 (preserved) | 980 |
| **Patch Size** | 14 (preserved) | 14 |
| **Audio d_model** | 64 | 1,280 |
| **TTS Hidden Size** | 128 | - |

## Parameter Breakdown

| Component | Parameters | Size (MB) |
|-----------|------------|-----------|
| TTS/DVAE | 19,339,766 | 36.89 |
| LLM | 1,419,840 | 2.71 |
| Vision | 835,328 | 1.59 |
| Resampler | 91,392 | 0.17 |
| Audio | 56,192 | 0.11 |
| Other | 20,736 | 0.04 |
| **Total** | **21,763,254** | **~41.5** |

## Technical Details

### Why Keep TTS/DVAE Components?

The TTS (Text-to-Speech) component, which includes the DVAE (Discrete Variational Auto-Encoder), accounts for approximately 37 MB (~85%) of the model size. While the optimum-intel tests do **not** exercise TTS functionality (they only test image+text → text generation), we retain this component because:

1. **Structural Consistency**: Removing TTS via `init_tts=False` causes structural differences in the model that lead to numerical divergence between PyTorch and OpenVINO outputs
2. **Test Compatibility**: The `test_compare_to_transformers` test compares PyTorch vs OpenVINO outputs and requires exact structural matching
3. **Architecture Integrity**: The MiniCPM-o architecture expects TTS weights to be present during model loading

### Tokenizer Shrinking

The vocabulary was reduced from 151,700 to 5,000 tokens:

- **Base tokens**: IDs 0-4899 (first 4,900 most common tokens)
- **Special tokens**: IDs 4900-4949 (remapped from original high IDs)
- **BPE merges**: Filtered from 151,387 to 4,644 (only merges involving retained tokens)

Key token mappings:
| Token | ID |
|-------|-----|
| `<unk>` | 4900 |
| `<\|endoftext\|>` | 4901 |
| `<\|im_start\|>` | 4902 |
| `<\|im_end\|>` | 4903 |

### Reproducibility

Model weights are initialized with a fixed random seed (42) to ensure:
- Reproducible outputs between runs
- Consistent behavior between PyTorch and OpenVINO
- Passing of `test_compare_to_transformers` which compares framework outputs

## Test Results

Tested with `pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v`:

| Test | Status | Notes |
|------|--------|-------|
| `test_compare_to_transformers` | ✅ PASSED | PyTorch/OpenVINO outputs match |
| `test_generate_utils` | ✅ PASSED | Generation pipeline works |
| `test_model_can_be_loaded_after_saving` | ⚠️ FAILED | Windows file locking issue (not model-related) |

The third test failure is a **Windows-specific issue** where OpenVINO keeps file handles open, preventing cleanup of temporary directories. This is a known platform limitation, not a model defect. The test passes on Linux/macOS.

## Usage

### For optimum-intel Testing

```python
# In optimum-intel/tests/openvino/utils_tests.py, update MODEL_NAMES:
MODEL_NAMES = {
    # ... other models ...
    "minicpmo": "hrithik-dev8/tiny-random-MiniCPM-o-2_6",
}
```

Then run tests:
```bash
pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v
```

### Basic Model Loading

```python
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "hrithik-dev8/tiny-random-MiniCPM-o-2_6",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "hrithik-dev8/tiny-random-MiniCPM-o-2_6",
    trust_remote_code=True
)
```

## Files Included

| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | 41.55 MB | Model weights (bfloat16) |
| `config.json` | 5.33 KB | Model configuration |
| `tokenizer.json` | 338.27 KB | Shrunk tokenizer (5,000 tokens) |
| `tokenizer_config.json` | 12.78 KB | Tokenizer settings |
| `vocab.json` | 85.70 KB | Vocabulary mapping |
| `merges.txt` | 36.58 KB | BPE merge rules |
| `preprocessor_config.json` | 1.07 KB | Image processor config |
| `generation_config.json` | 121 B | Generation settings |
| `added_tokens.json` | 1.13 KB | Special tokens |
| `special_tokens_map.json` | 1.24 KB | Special token mappings |

## Requirements

- Python 3.8+
- transformers >= 4.45.0, < 4.52.0
- torch
- For OpenVINO testing: optimum-intel with OpenVINO backend

## Limitations

⚠️ **This model is for testing only** - it produces random/meaningless outputs and should not be used for inference.