|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- multimodal |
|
|
- vision-language |
|
|
- openvino |
|
|
- optimum-intel |
|
|
- testing |
|
|
- tiny-model |
|
|
- minicpmo |
|
|
base_model: openbmb/MiniCPM-o-2_6 |
|
|
library_name: transformers |
|
|
pipeline_tag: image-text-to-text |
|
|
--- |
|
|
|
|
|
# Tiny Random MiniCPM-o-2_6 |
|
|
|
|
|
A tiny (~42 MB) randomly-initialized version of [MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6) designed for **testing purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library. |
|
|
|
|
|
## Purpose |
|
|
|
|
|
This model was created to replace the existing test model at `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` (185 MB) with a smaller alternative for CI/CD testing. Smaller test models reduce: |
|
|
|
|
|
- Download times in CI pipelines |
|
|
- Storage requirements |
|
|
- Test execution time |
|
|
|
|
|
## Size Comparison |
|
|
|
|
|
| Model | Total Size | Model Weights | |
|
|
|-------|------------|---------------| |
|
|
| [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (Original) | 17.4 GB | ~17 GB | |
|
|
| [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) (Current Test Model) | 185 MB | 169 MB | |
|
|
| **hrithik-dev8/tiny-random-MiniCPM-o-2_6** (This Model) | **~42 MB** | **41.55 MB** | |
|
|
|
|
|
**Result: 4× smaller than Intel's current test model** |
|
|
|
|
|
## Model Configuration |
|
|
|
|
|
| Component | This Model | Original | |
|
|
|-----------|------------|----------| |
|
|
| **Vocabulary** | 5,000 tokens | 151,700 tokens | |
|
|
| **LLM Hidden Size** | 128 | 3,584 | |
|
|
| **LLM Layers** | 1 | 40 | |
|
|
| **LLM Attention Heads** | 8 | 28 | |
|
|
| **Vision Hidden Size** | 128 | 1,152 | |
|
|
| **Vision Layers** | 1 | 27 | |
|
|
| **Image Size** | 980 (preserved) | 980 | |
|
|
| **Patch Size** | 14 (preserved) | 14 | |
|
|
| **Audio d_model** | 64 | 1,280 | |
|
|
| **TTS Hidden Size** | 128 | - | |
|
|
|
|
|
## Parameter Breakdown |
|
|
|
|
|
| Component | Parameters | Size (MB) | |
|
|
|-----------|------------|-----------| |
|
|
| TTS/DVAE | 19,339,766 | 36.89 | |
|
|
| LLM | 1,419,840 | 2.71 | |
|
|
| Vision | 835,328 | 1.59 | |
|
|
| Resampler | 91,392 | 0.17 | |
|
|
| Audio | 56,192 | 0.11 | |
|
|
| Other | 20,736 | 0.04 | |
|
|
| **Total** | **21,763,254** | **~41.5** | |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Why Keep TTS/DVAE Components? |
|
|
|
|
|
The TTS (Text-to-Speech) component, which includes the DVAE (Discrete Variational Auto-Encoder), accounts for approximately 37 MB (~85%) of the model size. While the optimum-intel tests do **not** exercise TTS functionality (they only test image+text → text generation), we retain this component because: |
|
|
|
|
|
1. **Structural Consistency**: Removing TTS via `init_tts=False` causes structural differences in the model that lead to numerical divergence between PyTorch and OpenVINO outputs |
|
|
2. **Test Compatibility**: The `test_compare_to_transformers` test compares PyTorch vs OpenVINO outputs and requires exact structural matching |
|
|
3. **Architecture Integrity**: The MiniCPM-o architecture expects TTS weights to be present during model loading |
|
|
|
|
|
### Tokenizer Shrinking |
|
|
|
|
|
The vocabulary was reduced from 151,700 to 5,000 tokens: |
|
|
|
|
|
- **Base tokens**: IDs 0-4899 (first 4,900 most common tokens) |
|
|
- **Special tokens**: IDs 4900-4949 (remapped from original high IDs) |
|
|
- **BPE merges**: Filtered from 151,387 to 4,644 (only merges involving retained tokens) |
|
|
|
|
|
Key token mappings: |
|
|
| Token | ID | |
|
|
|-------|-----| |
|
|
| `<unk>` | 4900 | |
|
|
| `<\|endoftext\|>` | 4901 | |
|
|
| `<\|im_start\|>` | 4902 | |
|
|
| `<\|im_end\|>` | 4903 | |
|
|
|
|
|
### Reproducibility |
|
|
|
|
|
Model weights are initialized with a fixed random seed (42) to ensure: |
|
|
- Reproducible outputs between runs |
|
|
- Consistent behavior between PyTorch and OpenVINO |
|
|
- Passing of `test_compare_to_transformers` which compares framework outputs |
|
|
|
|
|
## Test Results |
|
|
|
|
|
Tested with `pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v`: |
|
|
|
|
|
| Test | Status | Notes | |
|
|
|------|--------|-------| |
|
|
| `test_compare_to_transformers` | ✅ PASSED | PyTorch/OpenVINO outputs match | |
|
|
| `test_generate_utils` | ✅ PASSED | Generation pipeline works | |
|
|
| `test_model_can_be_loaded_after_saving` | ⚠️ FAILED | Windows file locking issue (not model-related) | |
|
|
|
|
|
The third test failure is a **Windows-specific issue** where OpenVINO keeps file handles open, preventing cleanup of temporary directories. This is a known platform limitation, not a model defect. The test passes on Linux/macOS. |
|
|
|
|
|
## Usage |
|
|
|
|
|
### For optimum-intel Testing |
|
|
|
|
|
```python |
|
|
# In optimum-intel/tests/openvino/utils_tests.py, update MODEL_NAMES: |
|
|
MODEL_NAMES = { |
|
|
# ... other models ... |
|
|
"minicpmo": "hrithik-dev8/tiny-random-MiniCPM-o-2_6", |
|
|
} |
|
|
``` |
|
|
|
|
|
Then run tests: |
|
|
```bash |
|
|
pytest tests/openvino/test_seq2seq.py -k "minicpmo" -v |
|
|
``` |
|
|
|
|
|
### Basic Model Loading |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
|
|
model = AutoModel.from_pretrained( |
|
|
"hrithik-dev8/tiny-random-MiniCPM-o-2_6", |
|
|
trust_remote_code=True |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"hrithik-dev8/tiny-random-MiniCPM-o-2_6", |
|
|
trust_remote_code=True |
|
|
) |
|
|
``` |
|
|
|
|
|
## Files Included |
|
|
|
|
|
| File | Size | Description | |
|
|
|------|------|-------------| |
|
|
| `model.safetensors` | 41.55 MB | Model weights (bfloat16) | |
|
|
| `config.json` | 5.33 KB | Model configuration | |
|
|
| `tokenizer.json` | 338.27 KB | Shrunk tokenizer (5,000 tokens) | |
|
|
| `tokenizer_config.json` | 12.78 KB | Tokenizer settings | |
|
|
| `vocab.json` | 85.70 KB | Vocabulary mapping | |
|
|
| `merges.txt` | 36.58 KB | BPE merge rules | |
|
|
| `preprocessor_config.json` | 1.07 KB | Image processor config | |
|
|
| `generation_config.json` | 121 B | Generation settings | |
|
|
| `added_tokens.json` | 1.13 KB | Special tokens | |
|
|
| `special_tokens_map.json` | 1.24 KB | Special token mappings | |
|
|
|
|
|
## Requirements |
|
|
|
|
|
- Python 3.8+ |
|
|
- transformers >= 4.45.0, < 4.52.0 |
|
|
- torch |
|
|
- For OpenVINO testing: optimum-intel with OpenVINO backend |
|
|
|
|
|
## Limitations |
|
|
|
|
|
⚠️ **This model is for testing only** - it produces random/meaningless outputs and should not be used for inference. |