| --- | |
| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - vision | |
| - image-text-to-text | |
| - multimodal | |
| - test-model | |
| - tiny-model | |
| - openvino | |
| - optimum-intel | |
| pipeline_tag: image-text-to-text | |
| --- | |
| # Tiny Random MiniCPM-o-2_6 | |
| ## Model Description | |
| This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) multimodal vision-language model, designed specifically for **testing and CI/CD purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library. | |
| **β οΈ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for: | |
| - Testing model loading and export functionality | |
| - CI/CD pipeline validation | |
| - OpenVINO conversion testing | |
| - Quantization workflow testing | |
| ## Model Specifications | |
| - **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS) | |
| - **Parameters**: 1,477,376 (~1.48M parameters) | |
| - **Model Binary Size**: 5.64 MB | |
| - **Total Repository Size**: ~21 MB | |
| - **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB) | |
| - **Size Reduction**: 853Γ smaller than the full model | |
| ## Architecture Details | |
| ### Language Model (LLM) Component | |
| - `num_hidden_layers`: 2 (reduced from 40) | |
| - `hidden_size`: 256 (reduced from 2048) | |
| - `intermediate_size`: 512 (reduced from 8192) | |
| - `num_attention_heads`: 4 (reduced from 32) | |
| - `vocab_size`: 320 (reduced from 151,700) | |
| - `max_position_embeddings`: 128 (reduced from 8192) | |
| ### Vision Component (SigLIP-based) | |
| - `hidden_size`: 8 | |
| - `num_hidden_layers`: 1 | |
| ### Audio Component (Whisper-based) | |
| - `d_model`: 64 | |
| - `encoder_layers`: 1 | |
| - `decoder_layers`: 1 | |
| ### TTS Component | |
| - `hidden_size`: 8 | |
| - `num_layers`: 1 | |
| All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements. | |
| ## Usage | |
| ### Loading with Transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoProcessor | |
| import torch | |
| model_id = "arashkermani/tiny-random-MiniCPM-o-2_6" | |
| # Load model | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| trust_remote_code=True, | |
| torch_dtype=torch.float32, | |
| device_map="cpu" | |
| ) | |
| # Load processor | |
| processor = AutoProcessor.from_pretrained( | |
| model_id, | |
| trust_remote_code=True | |
| ) | |
| # Test forward pass | |
| input_ids = torch.randint(0, 320, (1, 5)) | |
| position_ids = torch.arange(5).unsqueeze(0) | |
| data = { | |
| "input_ids": input_ids, | |
| "pixel_values": [[]], | |
| "tgt_sizes": [[]], | |
| "image_bound": [[]], | |
| "position_ids": position_ids, | |
| } | |
| with torch.no_grad(): | |
| outputs = model(data=data) | |
| print(f"Logits shape: {outputs.logits.shape}") # (1, 5, 320) | |
| ``` | |
| ### Using with Optimum-Intel (OpenVINO) | |
| ```python | |
| from optimum.intel.openvino import OVModelForVisualCausalLM | |
| from transformers import AutoProcessor | |
| model_id = "arashkermani/tiny-random-MiniCPM-o-2_6" | |
| # Load model for OpenVINO | |
| model = OVModelForVisualCausalLM.from_pretrained( | |
| model_id, | |
| trust_remote_code=True | |
| ) | |
| processor = AutoProcessor.from_pretrained( | |
| model_id, | |
| trust_remote_code=True | |
| ) | |
| ``` | |
| ### Export to OpenVINO | |
| ```bash | |
| optimum-cli export openvino \ | |
| -m arashkermani/tiny-random-MiniCPM-o-2_6 \ | |
| minicpm-o-openvino \ | |
| --task=image-text-to-text \ | |
| --trust-remote-code | |
| ``` | |
| ## Intended Use | |
| This model is intended **exclusively** for: | |
| - β Testing optimum-intel OpenVINO export functionality | |
| - β CI/CD pipeline validation | |
| - β Model loading and compatibility testing | |
| - β Quantization workflow testing | |
| - β Fast prototyping and debugging | |
| **Not intended for**: | |
| - β Production inference | |
| - β Actual image-text-to-text tasks | |
| - β Model quality evaluation | |
| - β Benchmarking performance metrics | |
| ## Training Details | |
| This model was generated by: | |
| 1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` | |
| 2. Reducing all dimensions to minimal viable values | |
| 3. Initializing weights randomly using `AutoModelForCausalLM.from_config()` | |
| 4. Copying all necessary tokenizer, processor, and custom code files | |
| **No training was performed** - all weights are randomly initialized. | |
| ## Validation Results | |
| The model has been validated to ensure: | |
| - β Loads with `trust_remote_code=True` | |
| - β Compatible with transformers AutoModel APIs | |
| - β Supports forward pass with expected input format | |
| - β Compatible with OpenVINO export via optimum-intel | |
| - β Includes all required custom modules and artifacts | |
| See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis. | |
| ## Files Included | |
| - `config.json` - Model configuration | |
| - `pytorch_model.bin` - Model weights (5.64 MB) | |
| - `generation_config.json` - Generation parameters | |
| - `preprocessor_config.json` - Preprocessor configuration | |
| - `processor_config.json` - Processor configuration | |
| - `tokenizer_config.json` - Tokenizer configuration | |
| - `tokenizer.json` - Fast tokenizer | |
| - `vocab.json` - Vocabulary | |
| - `merges.txt` - BPE merges | |
| - Custom Python modules: | |
| - `modeling_minicpmo.py` | |
| - `configuration_minicpm.py` | |
| - `processing_minicpmo.py` | |
| - `image_processing_minicpmv.py` | |
| - `tokenization_minicpmo_fast.py` | |
| - `modeling_navit_siglip.py` | |
| - `resampler.py` | |
| - `utils.py` | |
| ## Related Models | |
| - Original model: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) | |
| - Previous test model: [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) | |
| ## License | |
| This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0). | |
| ## Citation | |
| If you use this test model in your CI/CD or testing infrastructure, please reference: | |
| ```bibtex | |
| @misc{tiny-minicpm-o-2_6, | |
| author = {Arash Kermani}, | |
| title = {Tiny Random MiniCPM-o-2_6 for Testing}, | |
| year = {2026}, | |
| publisher = {HuggingFace}, | |
| howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}} | |
| } | |
| ``` | |
| ## Contact | |
| For issues or questions about this test model, please open an issue in the [optimum-intel repository](https://github.com/huggingface/optimum-intel/issues). | |