tiny-random-MiniCPM-o-2_6 / README.md.backup
arashkermani's picture
Upload folder using huggingface_hub
0bad3bb verified
raw
history blame
6.14 kB
---
license: apache-2.0
library_name: transformers
tags:
- vision
- image-text-to-text
- multimodal
- test-model
- tiny-model
- openvino
- optimum-intel
pipeline_tag: image-text-to-text
---
# Tiny Random MiniCPM-o-2_6
## Model Description
This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) multimodal vision-language model, designed specifically for **testing and CI/CD purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library.
**⚠️ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for:
- Testing model loading and export functionality
- CI/CD pipeline validation
- OpenVINO conversion testing
- Quantization workflow testing
## Model Specifications
- **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS)
- **Parameters**: 1,477,376 (~1.48M parameters)
- **Model Binary Size**: 5.64 MB
- **Total Repository Size**: ~21 MB
- **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB)
- **Size Reduction**: 853Γ— smaller than the full model
## Architecture Details
### Language Model (LLM) Component
- `num_hidden_layers`: 2 (reduced from 40)
- `hidden_size`: 256 (reduced from 2048)
- `intermediate_size`: 512 (reduced from 8192)
- `num_attention_heads`: 4 (reduced from 32)
- `vocab_size`: 320 (reduced from 151,700)
- `max_position_embeddings`: 128 (reduced from 8192)
### Vision Component (SigLIP-based)
- `hidden_size`: 8
- `num_hidden_layers`: 1
### Audio Component (Whisper-based)
- `d_model`: 64
- `encoder_layers`: 1
- `decoder_layers`: 1
### TTS Component
- `hidden_size`: 8
- `num_layers`: 1
All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements.
## Usage
### Loading with Transformers
```python
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"
# Load model
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float32,
device_map="cpu"
)
# Load processor
processor = AutoProcessor.from_pretrained(
model_id,
trust_remote_code=True
)
# Test forward pass
input_ids = torch.randint(0, 320, (1, 5))
position_ids = torch.arange(5).unsqueeze(0)
data = {
"input_ids": input_ids,
"pixel_values": [[]],
"tgt_sizes": [[]],
"image_bound": [[]],
"position_ids": position_ids,
}
with torch.no_grad():
outputs = model(data=data)
print(f"Logits shape: {outputs.logits.shape}") # (1, 5, 320)
```
### Using with Optimum-Intel (OpenVINO)
```python
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor
model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"
# Load model for OpenVINO
model = OVModelForVisualCausalLM.from_pretrained(
model_id,
trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
model_id,
trust_remote_code=True
)
```
### Export to OpenVINO
```bash
optimum-cli export openvino \
-m arashkermani/tiny-random-MiniCPM-o-2_6 \
minicpm-o-openvino \
--task=image-text-to-text \
--trust-remote-code
```
## Intended Use
This model is intended **exclusively** for:
- βœ… Testing optimum-intel OpenVINO export functionality
- βœ… CI/CD pipeline validation
- βœ… Model loading and compatibility testing
- βœ… Quantization workflow testing
- βœ… Fast prototyping and debugging
**Not intended for**:
- ❌ Production inference
- ❌ Actual image-text-to-text tasks
- ❌ Model quality evaluation
- ❌ Benchmarking performance metrics
## Training Details
This model was generated by:
1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6`
2. Reducing all dimensions to minimal viable values
3. Initializing weights randomly using `AutoModelForCausalLM.from_config()`
4. Copying all necessary tokenizer, processor, and custom code files
**No training was performed** - all weights are randomly initialized.
## Validation Results
The model has been validated to ensure:
- βœ… Loads with `trust_remote_code=True`
- βœ… Compatible with transformers AutoModel APIs
- βœ… Supports forward pass with expected input format
- βœ… Compatible with OpenVINO export via optimum-intel
- βœ… Includes all required custom modules and artifacts
See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis.
## Files Included
- `config.json` - Model configuration
- `pytorch_model.bin` - Model weights (5.64 MB)
- `generation_config.json` - Generation parameters
- `preprocessor_config.json` - Preprocessor configuration
- `processor_config.json` - Processor configuration
- `tokenizer_config.json` - Tokenizer configuration
- `tokenizer.json` - Fast tokenizer
- `vocab.json` - Vocabulary
- `merges.txt` - BPE merges
- Custom Python modules:
- `modeling_minicpmo.py`
- `configuration_minicpm.py`
- `processing_minicpmo.py`
- `image_processing_minicpmv.py`
- `tokenization_minicpmo_fast.py`
- `modeling_navit_siglip.py`
- `resampler.py`
- `utils.py`
## Related Models
- Original model: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6)
- Previous test model: [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6)
## License
This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0).
## Citation
If you use this test model in your CI/CD or testing infrastructure, please reference:
```bibtex
@misc{tiny-minicpm-o-2_6,
author = {Arash Kermani},
title = {Tiny Random MiniCPM-o-2_6 for Testing},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}}
}
```
## Contact
For issues or questions about this test model, please open an issue in the [optimum-intel repository](https://github.com/huggingface/optimum-intel/issues).