--- license: apache-2.0 library_name: transformers tags: - vision - image-text-to-text - multimodal - test-model - tiny-model - openvino - optimum-intel pipeline_tag: image-text-to-text --- # Tiny Random MiniCPM-o-2_6 ## Model Description This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) multimodal vision-language model, designed specifically for **testing and CI/CD purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library. **⚠️ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for: - Testing model loading and export functionality - CI/CD pipeline validation - OpenVINO conversion testing - Quantization workflow testing ## Model Specifications - **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS) - **Parameters**: 1,477,376 (~1.48M parameters) - **Model Binary Size**: 5.64 MB - **Total Repository Size**: ~21 MB - **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB) - **Size Reduction**: 853× smaller than the full model ## Architecture Details ### Language Model (LLM) Component - `num_hidden_layers`: 2 (reduced from 40) - `hidden_size`: 256 (reduced from 2048) - `intermediate_size`: 512 (reduced from 8192) - `num_attention_heads`: 4 (reduced from 32) - `vocab_size`: 320 (reduced from 151,700) - `max_position_embeddings`: 128 (reduced from 8192) ### Vision Component (SigLIP-based) - `hidden_size`: 8 - `num_hidden_layers`: 1 ### Audio Component (Whisper-based) - `d_model`: 64 - `encoder_layers`: 1 - `decoder_layers`: 1 ### TTS Component - `hidden_size`: 8 - `num_layers`: 1 All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements. ## Usage ### Loading with Transformers ```python from transformers import AutoModelForCausalLM, AutoProcessor import torch model_id = "arashkermani/tiny-random-MiniCPM-o-2_6" # Load model model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, torch_dtype=torch.float32, device_map="cpu" ) # Load processor processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True ) # Test forward pass input_ids = torch.randint(0, 320, (1, 5)) position_ids = torch.arange(5).unsqueeze(0) data = { "input_ids": input_ids, "pixel_values": [[]], "tgt_sizes": [[]], "image_bound": [[]], "position_ids": position_ids, } with torch.no_grad(): outputs = model(data=data) print(f"Logits shape: {outputs.logits.shape}") # (1, 5, 320) ``` ### Using with Optimum-Intel (OpenVINO) ```python from optimum.intel.openvino import OVModelForVisualCausalLM from transformers import AutoProcessor model_id = "arashkermani/tiny-random-MiniCPM-o-2_6" # Load model for OpenVINO model = OVModelForVisualCausalLM.from_pretrained( model_id, trust_remote_code=True ) processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True ) ``` ### Export to OpenVINO ```bash optimum-cli export openvino \ -m arashkermani/tiny-random-MiniCPM-o-2_6 \ minicpm-o-openvino \ --task=image-text-to-text \ --trust-remote-code ``` ## Intended Use This model is intended **exclusively** for: - ✅ Testing optimum-intel OpenVINO export functionality - ✅ CI/CD pipeline validation - ✅ Model loading and compatibility testing - ✅ Quantization workflow testing - ✅ Fast prototyping and debugging **Not intended for**: - ❌ Production inference - ❌ Actual image-text-to-text tasks - ❌ Model quality evaluation - ❌ Benchmarking performance metrics ## Training Details This model was generated by: 1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` 2. Reducing all dimensions to minimal viable values 3. Initializing weights randomly using `AutoModelForCausalLM.from_config()` 4. Copying all necessary tokenizer, processor, and custom code files **No training was performed** - all weights are randomly initialized. ## Validation Results The model has been validated to ensure: - ✅ Loads with `trust_remote_code=True` - ✅ Compatible with transformers AutoModel APIs - ✅ Supports forward pass with expected input format - ✅ Compatible with OpenVINO export via optimum-intel - ✅ Includes all required custom modules and artifacts See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis. ## Files Included - `config.json` - Model configuration - `pytorch_model.bin` - Model weights (5.64 MB) - `generation_config.json` - Generation parameters - `preprocessor_config.json` - Preprocessor configuration - `processor_config.json` - Processor configuration - `tokenizer_config.json` - Tokenizer configuration - `tokenizer.json` - Fast tokenizer - `vocab.json` - Vocabulary - `merges.txt` - BPE merges - Custom Python modules: - `modeling_minicpmo.py` - `configuration_minicpm.py` - `processing_minicpmo.py` - `image_processing_minicpmv.py` - `tokenization_minicpmo_fast.py` - `modeling_navit_siglip.py` - `resampler.py` - `utils.py` ## Related Models - Original model: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) - Previous test model: [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) ## License This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0). ## Citation If you use this test model in your CI/CD or testing infrastructure, please reference: ```bibtex @misc{tiny-minicpm-o-2_6, author = {Arash Kermani}, title = {Tiny Random MiniCPM-o-2_6 for Testing}, year = {2026}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}} } ``` ## Contact For issues or questions about this test model, please open an issue in the [optimum-intel repository](https://github.com/huggingface/optimum-intel/issues).