File size: 5,815 Bytes
1c546e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
---
license: apache-2.0
language:
- en
tags:
- multimodal
- vision-language
- openvino
- optimum-intel
- testing
- tiny-model
- minicpmo
base_model: openbmb/MiniCPM-o-2_6
library_name: transformers
pipeline_tag: image-text-to-text
---
# Tiny Random MiniCPM-o-2_6
A tiny (~42 MB) randomly-initialized version of [MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6) designed for **testing purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library.
## Purpose
This model was created to replace the existing test model at `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` (185 MB) with a smaller alternative for CI/CD testing. Smaller test models reduce:
- Download times in CI pipelines
- Storage requirements
- Test execution time
## Size Comparison
| Model | Total Size | Model Weights |
|-------|------------|---------------|
| [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (Original) | 17.4 GB | ~17 GB |
| [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) (Current Test Model) | 185 MB | 169 MB |
| **hrithik-dev8/tiny-random-MiniCPM-o-2_6** (This Model) | **~42 MB** | **41.55 MB** |
**Result: 4× smaller than Intel's current test model**
## Model Configuration
| Component | This Model | Original |
|-----------|------------|----------|
| **Vocabulary** | 5,000 tokens | 151,700 tokens |
| **LLM Hidden Size** | 128 | 3,584 |
| **LLM Layers** | 1 | 40 |
| **LLM Attention Heads** | 8 | 28 |
| **Vision Hidden Size** | 128 | 1,152 |
| **Vision Layers** | 1 | 27 |
| **Image Size** | 980 (preserved) | 980 |
| **Patch Size** | 14 (preserved) | 14 |
| **Audio d_model** | 64 | 1,280 |
| **TTS Hidden Size** | 128 | - |
## Parameter Breakdown
| Component | Parameters | Size (MB) |
|-----------|------------|-----------|
| TTS/DVAE | 19,339,766 | 36.89 |
| LLM | 1,419,840 | 2.71 |
| Vision | 835,328 | 1.59 |
| Resampler | 91,392 | 0.17 |
| Audio | 56,192 | 0.11 |
| Other | 20,736 | 0.04 |
| **Total** | **21,763,254** | **~41.5** |
## Technical Details
### Why Keep TTS/DVAE Components?
The TTS (Text-to-Speech) component, which includes the DVAE (Discrete Variational Auto-Encoder), accounts for approximately 37 MB (~85%) of the model size. While the optimum-intel tests do **not** exercise TTS functionality (they only test image+text → text generation), we retain this component because:
1. **Structural Consistency**: Removing TTS via `init_tts=False` causes structural differences in the model that lead to numerical divergence between PyTorch and OpenVINO outputs
2. **Test Compatibility**: The `test_compare_to_transformers` test compares PyTorch vs OpenVINO outputs and requires exact structural matching
3. **Architecture Integrity**: The MiniCPM-o architecture expects TTS weights to be present during model loading
### Tokenizer Shrinking
The vocabulary was reduced from 151,700 to 5,000 tokens:
- **Base tokens**: IDs 0-4899 (first 4,900 most common tokens)
- **Special tokens**: IDs 4900-4949 (remapped from original high IDs)
- **BPE merges**: Filtered from 151,387 to 4,644 (only merges involving retained tokens)
Key token mappings:
| Token | ID |
|-------|-----|
| `<unk>` | 4900 |
| `<\|endoftext\|>` | 4901 |
| `<\|im_start\|>` | 4902 |
| `<\|im_end\|>` | 4903 |
### Reproducibility
Model weights are initialized with a fixed random seed (42) to ensure:
- Reproducible outputs between runs
- Consistent behavior between PyTorch and OpenVINO
- Passing of `test_compare_to_transformers` which compares framework outputs
## Test Results
Tested with `pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v`:
| Test | Status | Notes |
|------|--------|-------|
| `test_compare_to_transformers` | ✅ PASSED | PyTorch/OpenVINO outputs match |
| `test_generate_utils` | ✅ PASSED | Generation pipeline works |
| `test_model_can_be_loaded_after_saving` | ⚠️ FAILED | Windows file locking issue (not model-related) |
The third test failure is a **Windows-specific issue** where OpenVINO keeps file handles open, preventing cleanup of temporary directories. This is a known platform limitation, not a model defect. The test passes on Linux/macOS.
## Usage
### For optimum-intel Testing
```python
# In optimum-intel/tests/openvino/utils_tests.py, update MODEL_NAMES:
MODEL_NAMES = {
# ... other models ...
"minicpmo": "hrithik-dev8/tiny-random-MiniCPM-o-2_6",
}
```
Then run tests:
```bash
pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v
```
### Basic Model Loading
```python
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"hrithik-dev8/tiny-random-MiniCPM-o-2_6",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"hrithik-dev8/tiny-random-MiniCPM-o-2_6",
trust_remote_code=True
)
```
## Files Included
| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | 41.55 MB | Model weights (bfloat16) |
| `config.json` | 5.33 KB | Model configuration |
| `tokenizer.json` | 338.27 KB | Shrunk tokenizer (5,000 tokens) |
| `tokenizer_config.json` | 12.78 KB | Tokenizer settings |
| `vocab.json` | 85.70 KB | Vocabulary mapping |
| `merges.txt` | 36.58 KB | BPE merge rules |
| `preprocessor_config.json` | 1.07 KB | Image processor config |
| `generation_config.json` | 121 B | Generation settings |
| `added_tokens.json` | 1.13 KB | Special tokens |
| `special_tokens_map.json` | 1.24 KB | Special token mappings |
## Requirements
- Python 3.8+
- transformers >= 4.45.0, < 4.52.0
- torch
- For OpenVINO testing: optimum-intel with OpenVINO backend
## Limitations
⚠️ **This model is for testing only** - it produces random/meaningless outputs and should not be used for inference. |