File size: 6,141 Bytes
0bad3bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
license: apache-2.0
library_name: transformers
tags:
  - vision
  - image-text-to-text
  - multimodal
  - test-model
  - tiny-model
  - openvino
  - optimum-intel
pipeline_tag: image-text-to-text
---

# Tiny Random MiniCPM-o-2_6

## Model Description

This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) multimodal vision-language model, designed specifically for **testing and CI/CD purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library.

**⚠️ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for:
- Testing model loading and export functionality
- CI/CD pipeline validation
- OpenVINO conversion testing
- Quantization workflow testing

## Model Specifications

- **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS)
- **Parameters**: 1,477,376 (~1.48M parameters)
- **Model Binary Size**: 5.64 MB
- **Total Repository Size**: ~21 MB
- **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB)
- **Size Reduction**: 853Γ— smaller than the full model

## Architecture Details

### Language Model (LLM) Component
- `num_hidden_layers`: 2 (reduced from 40)
- `hidden_size`: 256 (reduced from 2048)
- `intermediate_size`: 512 (reduced from 8192)
- `num_attention_heads`: 4 (reduced from 32)
- `vocab_size`: 320 (reduced from 151,700)
- `max_position_embeddings`: 128 (reduced from 8192)

### Vision Component (SigLIP-based)
- `hidden_size`: 8
- `num_hidden_layers`: 1

### Audio Component (Whisper-based)
- `d_model`: 64
- `encoder_layers`: 1
- `decoder_layers`: 1

### TTS Component
- `hidden_size`: 8
- `num_layers`: 1

All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements.

## Usage

### Loading with Transformers

```python
from transformers import AutoModelForCausalLM, AutoProcessor
import torch

model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float32,
    device_map="cpu"
)

# Load processor
processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)

# Test forward pass
input_ids = torch.randint(0, 320, (1, 5))
position_ids = torch.arange(5).unsqueeze(0)

data = {
    "input_ids": input_ids,
    "pixel_values": [[]],
    "tgt_sizes": [[]],
    "image_bound": [[]],
    "position_ids": position_ids,
}

with torch.no_grad():
    outputs = model(data=data)

print(f"Logits shape: {outputs.logits.shape}")  # (1, 5, 320)
```

### Using with Optimum-Intel (OpenVINO)

```python
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor

model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"

# Load model for OpenVINO
model = OVModelForVisualCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True
)

processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)
```

### Export to OpenVINO

```bash
optimum-cli export openvino \
  -m arashkermani/tiny-random-MiniCPM-o-2_6 \
  minicpm-o-openvino \
  --task=image-text-to-text \
  --trust-remote-code
```

## Intended Use

This model is intended **exclusively** for:
- βœ… Testing optimum-intel OpenVINO export functionality
- βœ… CI/CD pipeline validation
- βœ… Model loading and compatibility testing
- βœ… Quantization workflow testing
- βœ… Fast prototyping and debugging

**Not intended for**:
- ❌ Production inference
- ❌ Actual image-text-to-text tasks
- ❌ Model quality evaluation
- ❌ Benchmarking performance metrics

## Training Details

This model was generated by:
1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6`
2. Reducing all dimensions to minimal viable values
3. Initializing weights randomly using `AutoModelForCausalLM.from_config()`
4. Copying all necessary tokenizer, processor, and custom code files

**No training was performed** - all weights are randomly initialized.

## Validation Results

The model has been validated to ensure:
- βœ… Loads with `trust_remote_code=True`
- βœ… Compatible with transformers AutoModel APIs
- βœ… Supports forward pass with expected input format
- βœ… Compatible with OpenVINO export via optimum-intel
- βœ… Includes all required custom modules and artifacts

See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis.

## Files Included

- `config.json` - Model configuration
- `pytorch_model.bin` - Model weights (5.64 MB)
- `generation_config.json` - Generation parameters
- `preprocessor_config.json` - Preprocessor configuration
- `processor_config.json` - Processor configuration
- `tokenizer_config.json` - Tokenizer configuration
- `tokenizer.json` - Fast tokenizer
- `vocab.json` - Vocabulary
- `merges.txt` - BPE merges
- Custom Python modules:
  - `modeling_minicpmo.py`
  - `configuration_minicpm.py`
  - `processing_minicpmo.py`
  - `image_processing_minicpmv.py`
  - `tokenization_minicpmo_fast.py`
  - `modeling_navit_siglip.py`
  - `resampler.py`
  - `utils.py`

## Related Models

- Original model: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6)
- Previous test model: [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6)

## License

This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0).

## Citation

If you use this test model in your CI/CD or testing infrastructure, please reference:

```bibtex
@misc{tiny-minicpm-o-2_6,
  author = {Arash Kermani},
  title = {Tiny Random MiniCPM-o-2_6 for Testing},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}}
}
```

## Contact

For issues or questions about this test model, please open an issue in the [optimum-intel repository](https://github.com/huggingface/optimum-intel/issues).