arashkermani's picture
Upload folder using huggingface_hub
924a12c verified
|
raw
history blame
6.14 kB
metadata
license: apache-2.0
library_name: transformers
tags:
  - vision
  - image-text-to-text
  - multimodal
  - test-model
  - tiny-model
  - openvino
  - optimum-intel
pipeline_tag: image-text-to-text

Tiny Random MiniCPM-o-2_6

Model Description

This is a tiny random-initialized version of the openbmb/MiniCPM-o-2_6 multimodal vision-language model, designed specifically for testing and CI/CD purposes in the optimum-intel library.

⚠️ Important: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for:

  • Testing model loading and export functionality
  • CI/CD pipeline validation
  • OpenVINO conversion testing
  • Quantization workflow testing

Model Specifications

  • Architecture: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS)
  • Parameters: 1,477,376 (~1.48M parameters)
  • Model Binary Size: 5.64 MB
  • Total Repository Size: ~21 MB
  • Original Model: openbmb/MiniCPM-o-2_6 (~18 GB)
  • Size Reduction: 853Γ— smaller than the full model

Architecture Details

Language Model (LLM) Component

  • num_hidden_layers: 2 (reduced from 40)
  • hidden_size: 256 (reduced from 2048)
  • intermediate_size: 512 (reduced from 8192)
  • num_attention_heads: 4 (reduced from 32)
  • vocab_size: 320 (reduced from 151,700)
  • max_position_embeddings: 128 (reduced from 8192)

Vision Component (SigLIP-based)

  • hidden_size: 8
  • num_hidden_layers: 1

Audio Component (Whisper-based)

  • d_model: 64
  • encoder_layers: 1
  • decoder_layers: 1

TTS Component

  • hidden_size: 8
  • num_layers: 1

All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements.

Usage

Loading with Transformers

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float32,
    device_map="cpu"
)

# Load processor
processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)

# Test forward pass
input_ids = torch.randint(0, 320, (1, 5))
position_ids = torch.arange(5).unsqueeze(0)

data = {
    "input_ids": input_ids,
    "pixel_values": [[]],
    "tgt_sizes": [[]],
    "image_bound": [[]],
    "position_ids": position_ids,
}

with torch.no_grad():
    outputs = model(data=data)

print(f"Logits shape: {outputs.logits.shape}")  # (1, 5, 320)

Using with Optimum-Intel (OpenVINO)

from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor

model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"

# Load model for OpenVINO
model = OVModelForVisualCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True
)

processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)

Export to OpenVINO

optimum-cli export openvino \
  -m arashkermani/tiny-random-MiniCPM-o-2_6 \
  minicpm-o-openvino \
  --task=image-text-to-text \
  --trust-remote-code

Intended Use

This model is intended exclusively for:

  • βœ… Testing optimum-intel OpenVINO export functionality
  • βœ… CI/CD pipeline validation
  • βœ… Model loading and compatibility testing
  • βœ… Quantization workflow testing
  • βœ… Fast prototyping and debugging

Not intended for:

  • ❌ Production inference
  • ❌ Actual image-text-to-text tasks
  • ❌ Model quality evaluation
  • ❌ Benchmarking performance metrics

Training Details

This model was generated by:

  1. Loading the config from optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6
  2. Reducing all dimensions to minimal viable values
  3. Initializing weights randomly using AutoModelForCausalLM.from_config()
  4. Copying all necessary tokenizer, processor, and custom code files

No training was performed - all weights are randomly initialized.

Validation Results

The model has been validated to ensure:

  • βœ… Loads with trust_remote_code=True
  • βœ… Compatible with transformers AutoModel APIs
  • βœ… Supports forward pass with expected input format
  • βœ… Compatible with OpenVINO export via optimum-intel
  • βœ… Includes all required custom modules and artifacts

See the validation report for detailed technical analysis.

Files Included

  • config.json - Model configuration
  • pytorch_model.bin - Model weights (5.64 MB)
  • generation_config.json - Generation parameters
  • preprocessor_config.json - Preprocessor configuration
  • processor_config.json - Processor configuration
  • tokenizer_config.json - Tokenizer configuration
  • tokenizer.json - Fast tokenizer
  • vocab.json - Vocabulary
  • merges.txt - BPE merges
  • Custom Python modules:
    • modeling_minicpmo.py
    • configuration_minicpm.py
    • processing_minicpmo.py
    • image_processing_minicpmv.py
    • tokenization_minicpmo_fast.py
    • modeling_navit_siglip.py
    • resampler.py
    • utils.py

Related Models

License

This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0).

Citation

If you use this test model in your CI/CD or testing infrastructure, please reference:

@misc{tiny-minicpm-o-2_6,
  author = {Arash Kermani},
  title = {Tiny Random MiniCPM-o-2_6 for Testing},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}}
}

Contact

For issues or questions about this test model, please open an issue in the optimum-intel repository.