--- library_name: transformers license: apache-2.0 tags: - vision - multimodal - tiny-model - minicpm pipeline_tag: image-to-text --- # Tiny MiniCPM-o-2_6 Model A minimal, optimized version of MiniCPM-o-2_6 for testing and development purposes. ## Model Details - **Model Size**: ~54 MB (PyTorch safetensors format) - **Format**: PyTorch safetensors (not OpenVINO IR) - **Vocabulary Size**: 50,000 tokens (reduced from 151,700) - **Architecture**: MiniCPM-o-2_6 with optimized dimensions ## Model Configuration - **hidden_size**: 128 (reduced from 168) - **intermediate_size**: 8 (reduced from 16) - **num_hidden_layers**: 2 - **num_attention_heads**: 2 (reduced from 28) - **query_num**: 64 ## Usage ```python from transformers import AutoProcessor, AutoModelForCausalLM from PIL import Image # Load processor and model processor = AutoProcessor.from_pretrained("M-Ziyo/tiny-random-MiniCPM-o-2_6-mini", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("M-Ziyo/tiny-random-MiniCPM-o-2_6-mini", trust_remote_code=True) # Prepare inputs prompt = "<|im_start|>user\n(./)\nWhat is in the image?<|im_end|>\n<|im_start|>assistant\n" image = Image.open("your_image.jpg") inputs = processor([prompt], [image], return_tensors="pt") # Generate result = model.generate(**inputs, max_new_tokens=50) decoded = processor.tokenizer.batch_decode(result[:, inputs["input_ids"].shape[1]:]) print(decoded) ``` ## Model Features - ✅ **PyTorch format** with safetensors (not OpenVINO IR) - ✅ **Optimized size** (~54 MB vs original) - ✅ **Weight copying** from original model for better output quality - ✅ **Diverse output** (not just repetitive characters) ## Notes - This is a minimal test model for development purposes - Model weights are copied from the original model for better initialization - Designed for testing Optimum-Intel integration ## Citation Based on MiniCPM-o-2_6 from OpenBMB.