| | --- |
| | license: apache-2.0 |
| | library_name: transformers |
| | tags: |
| | - vision |
| | - image-text-to-text |
| | - multimodal |
| | - test-model |
| | - tiny-model |
| | - openvino |
| | - optimum-intel |
| | pipeline_tag: image-text-to-text |
| | --- |
| | |
| | # Tiny Random MiniCPM-o-2_6 |
| | |
| | ## Model Description |
| | |
| | This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) multimodal vision-language model, designed specifically for **testing and CI/CD purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library. |
| | |
| | **β οΈ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for: |
| | - Testing model loading and export functionality |
| | - CI/CD pipeline validation |
| | - OpenVINO conversion testing |
| | - Quantization workflow testing |
| | |
| | ## Model Specifications |
| | |
| | - **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS) |
| | - **Parameters**: 1,477,376 (~1.48M parameters) |
| | - **Model Binary Size**: 5.64 MB |
| | - **Total Repository Size**: ~21 MB |
| | - **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB) |
| | - **Size Reduction**: 853Γ smaller than the full model |
| |
|
| | ## Architecture Details |
| |
|
| | ### Language Model (LLM) Component |
| | - `num_hidden_layers`: 2 (reduced from 40) |
| | - `hidden_size`: 256 (reduced from 2048) |
| | - `intermediate_size`: 512 (reduced from 8192) |
| | - `num_attention_heads`: 4 (reduced from 32) |
| | - `vocab_size`: 320 (reduced from 151,700) |
| | - `max_position_embeddings`: 128 (reduced from 8192) |
| |
|
| | ### Vision Component (SigLIP-based) |
| | - `hidden_size`: 8 |
| | - `num_hidden_layers`: 1 |
| |
|
| | ### Audio Component (Whisper-based) |
| | - `d_model`: 64 |
| | - `encoder_layers`: 1 |
| | - `decoder_layers`: 1 |
| |
|
| | ### TTS Component |
| | - `hidden_size`: 8 |
| | - `num_layers`: 1 |
| |
|
| | All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements. |
| |
|
| | ## Usage |
| |
|
| | ### Loading with Transformers |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoProcessor |
| | import torch |
| | |
| | model_id = "arashkermani/tiny-random-MiniCPM-o-2_6" |
| | |
| | # Load model |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | trust_remote_code=True, |
| | torch_dtype=torch.float32, |
| | device_map="cpu" |
| | ) |
| | |
| | # Load processor |
| | processor = AutoProcessor.from_pretrained( |
| | model_id, |
| | trust_remote_code=True |
| | ) |
| | |
| | # Test forward pass |
| | input_ids = torch.randint(0, 320, (1, 5)) |
| | position_ids = torch.arange(5).unsqueeze(0) |
| | |
| | data = { |
| | "input_ids": input_ids, |
| | "pixel_values": [[]], |
| | "tgt_sizes": [[]], |
| | "image_bound": [[]], |
| | "position_ids": position_ids, |
| | } |
| | |
| | with torch.no_grad(): |
| | outputs = model(data=data) |
| | |
| | print(f"Logits shape: {outputs.logits.shape}") # (1, 5, 320) |
| | ``` |
| |
|
| | ### Using with Optimum-Intel (OpenVINO) |
| |
|
| | ```python |
| | from optimum.intel.openvino import OVModelForVisualCausalLM |
| | from transformers import AutoProcessor |
| | |
| | model_id = "arashkermani/tiny-random-MiniCPM-o-2_6" |
| | |
| | # Load model for OpenVINO |
| | model = OVModelForVisualCausalLM.from_pretrained( |
| | model_id, |
| | trust_remote_code=True |
| | ) |
| | |
| | processor = AutoProcessor.from_pretrained( |
| | model_id, |
| | trust_remote_code=True |
| | ) |
| | ``` |
| |
|
| | ### Export to OpenVINO |
| |
|
| | ```bash |
| | optimum-cli export openvino \ |
| | -m arashkermani/tiny-random-MiniCPM-o-2_6 \ |
| | minicpm-o-openvino \ |
| | --task=image-text-to-text \ |
| | --trust-remote-code |
| | ``` |
| |
|
| | ## Intended Use |
| |
|
| | This model is intended **exclusively** for: |
| | - β
Testing optimum-intel OpenVINO export functionality |
| | - β
CI/CD pipeline validation |
| | - β
Model loading and compatibility testing |
| | - β
Quantization workflow testing |
| | - β
Fast prototyping and debugging |
| |
|
| | **Not intended for**: |
| | - β Production inference |
| | - β Actual image-text-to-text tasks |
| | - β Model quality evaluation |
| | - β Benchmarking performance metrics |
| |
|
| | ## Training Details |
| |
|
| | This model was generated by: |
| | 1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6` |
| | 2. Reducing all dimensions to minimal viable values |
| | 3. Initializing weights randomly using `AutoModelForCausalLM.from_config()` |
| | 4. Copying all necessary tokenizer, processor, and custom code files |
| |
|
| | **No training was performed** - all weights are randomly initialized. |
| |
|
| | ## Validation Results |
| |
|
| | The model has been validated to ensure: |
| | - β
Loads with `trust_remote_code=True` |
| | - β
Compatible with transformers AutoModel APIs |
| | - β
Supports forward pass with expected input format |
| | - β
Compatible with OpenVINO export via optimum-intel |
| | - β
Includes all required custom modules and artifacts |
| |
|
| | See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis. |
| |
|
| | ## Files Included |
| |
|
| | - `config.json` - Model configuration |
| | - `pytorch_model.bin` - Model weights (5.64 MB) |
| | - `generation_config.json` - Generation parameters |
| | - `preprocessor_config.json` - Preprocessor configuration |
| | - `processor_config.json` - Processor configuration |
| | - `tokenizer_config.json` - Tokenizer configuration |
| | - `tokenizer.json` - Fast tokenizer |
| | - `vocab.json` - Vocabulary |
| | - `merges.txt` - BPE merges |
| | - Custom Python modules: |
| | - `modeling_minicpmo.py` |
| | - `configuration_minicpm.py` |
| | - `processing_minicpmo.py` |
| | - `image_processing_minicpmv.py` |
| | - `tokenization_minicpmo_fast.py` |
| | - `modeling_navit_siglip.py` |
| | - `resampler.py` |
| | - `utils.py` |
| |
|
| | ## Related Models |
| |
|
| | - Original model: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) |
| | - Previous test model: [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6) |
| |
|
| | ## License |
| |
|
| | This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0). |
| | |
| | ## Citation |
| | |
| | If you use this test model in your CI/CD or testing infrastructure, please reference: |
| | |
| | ```bibtex |
| | @misc{tiny-minicpm-o-2_6, |
| | author = {Arash Kermani}, |
| | title = {Tiny Random MiniCPM-o-2_6 for Testing}, |
| | year = {2026}, |
| | publisher = {HuggingFace}, |
| | howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}} |
| | } |
| | ``` |
| | |
| | ## Contact |
| | |
| | For issues or questions about this test model, please open an issue in the [optimum-intel repository](https://github.com/huggingface/optimum-intel/issues). |
| | |