File size: 2,007 Bytes
065bff0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---

library_name: transformers
license: apache-2.0
tags:
- vision
- multimodal
- tiny-model
- minicpm
pipeline_tag: image-to-text
---


# Tiny MiniCPM-o-2_6 Model



A minimal, optimized version of MiniCPM-o-2_6 for testing and development purposes.

## Model Details

- **Model Size**: ~54 MB (PyTorch safetensors format)
- **Format**: PyTorch safetensors (not OpenVINO IR)
- **Vocabulary Size**: 50,000 tokens (reduced from 151,700)
- **Architecture**: MiniCPM-o-2_6 with optimized dimensions



## Model Configuration



- **hidden_size**: 128 (reduced from 168)

- **intermediate_size**: 8 (reduced from 16)

- **num_hidden_layers**: 2

- **num_attention_heads**: 2 (reduced from 28)

- **query_num**: 64



## Usage



```python

from transformers import AutoProcessor, AutoModelForCausalLM

from PIL import Image



# Load processor and model

processor = AutoProcessor.from_pretrained("M-Ziyo/tiny-random-MiniCPM-o-2_6-mini", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("M-Ziyo/tiny-random-MiniCPM-o-2_6-mini", trust_remote_code=True)



# Prepare inputs

prompt = "<|im_start|>user\n(<image>./</image>)\nWhat is in the image?<|im_end|>\n<|im_start|>assistant\n"
image = Image.open("your_image.jpg")



inputs = processor([prompt], [image], return_tensors="pt")

# Generate
result = model.generate(**inputs, max_new_tokens=50)

decoded = processor.tokenizer.batch_decode(result[:, inputs["input_ids"].shape[1]:])

print(decoded)

```



## Model Features



- ✅ **PyTorch format** with safetensors (not OpenVINO IR)
-**Optimized size** (~54 MB vs original)
-**Weight copying** from original model for better output quality
-**Diverse output** (not just repetitive characters)

## Notes

- This is a minimal test model for development purposes
- Model weights are copied from the original model for better initialization
- Designed for testing Optimum-Intel integration

## Citation

Based on MiniCPM-o-2_6 from OpenBMB.