README.md.backup · arashkermani/tiny-random-MiniCPM-o-2

File size: 6,141 Bytes

0bad3bb

---
license: apache-2.0
library_name: transformers
tags:
  - vision
  - image-text-to-text
  - multimodal
  - test-model
  - tiny-model
  - openvino
  - optimum-intel
pipeline_tag: image-text-to-text
---

# Tiny Random MiniCPM-o-2_6

## Model Description

This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) multimodal vision-language model, designed specifically for **testing and CI/CD purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library.

**⚠️ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for:
- Testing model loading and export functionality
- CI/CD pipeline validation
- OpenVINO conversion testing
- Quantization workflow testing

## Model Specifications

- **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS)
- **Parameters**: 1,477,376 (~1.48M parameters)
- **Model Binary Size**: 5.64 MB
- **Total Repository Size**: ~21 MB
- **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB)
- **Size Reduction**: 853× smaller than the full model

## Architecture Details

### Language Model (LLM) Component
- `num_hidden_layers`: 2 (reduced from 40)
- `hidden_size`: 256 (reduced from 2048)
- `intermediate_size`: 512 (reduced from 8192)
- `num_attention_heads`: 4 (reduced from 32)
- `vocab_size`: 320 (reduced from 151,700)
- `max_position_embeddings`: 128 (reduced from 8192)

### Vision Component (SigLIP-based)
- `hidden_size`: 8
- `num_hidden_layers`: 1

### Audio Component (Whisper-based)
- `d_model`: 64
- `encoder_layers`: 1
- `decoder_layers`: 1

### TTS Component
- `hidden_size`: 8
- `num_layers`: 1

All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements.

## Usage

### Loading with Transformers

```python
from transformers import AutoModelForCausalLM, AutoProcessor
import torch

model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float32,
    device_map="cpu"
)

# Load processor
processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)

# Test forward pass
input_ids = torch.randint(0, 320, (1, 5))
position_ids = torch.arange(5).unsqueeze(0)

data = {
    "input_ids": input_ids,
    "pixel_values": [[]],
    "tgt_sizes": [[]],
    "image_bound": [[]],
    "position_ids": position_ids,
}

with torch.no_grad():
    outputs = model(data=data)

print(f"Logits shape: {outputs.logits.shape}")  # (1, 5, 320)
```

### Using with Optimum-Intel (OpenVINO)

```python
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor

model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"

# Load model for OpenVINO
model = OVModelForVisualCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True
)

processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)
```

### Export to OpenVINO

```bash
optimum-cli export openvino \
  -m arashkermani/tiny-random-MiniCPM-o-2_6 \
  minicpm-o-openvino \
  --task=image-text-to-text \
  --trust-remote-code
```

## Intended Use

This model is intended **exclusively** for:
- ✅ Testing optimum-intel OpenVINO export functionality
- ✅ CI/CD pipeline validation
- ✅ Model loading and compatibility testing
- ✅ Quantization workflow testing
- ✅ Fast prototyping and debugging

**Not intended for**:
- ❌ Production inference
- ❌ Actual image-text-to-text tasks
- ❌ Model quality evaluation
- ❌ Benchmarking performance metrics

## Training Details

This model was generated by:
1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6`
2. Reducing all dimensions to minimal viable values
3. Initializing weights randomly using `AutoModelForCausalLM.from_config()`
4. Copying all necessary tokenizer, processor, and custom code files

**No training was performed** - all weights are randomly initialized.

## Validation Results

The model has been validated to ensure:
- ✅ Loads with `trust_remote_code=True`
- ✅ Compatible with transformers AutoModel APIs
- ✅ Supports forward pass with expected input format
- ✅ Compatible with OpenVINO export via optimum-intel
- ✅ Includes all required custom modules and artifacts

See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis.

## Files Included

- `config.json` - Model configuration
- `pytorch_model.bin` - Model weights (5.64 MB)
- `generation_config.json` - Generation parameters
- `preprocessor_config.json` - Preprocessor configuration
- `processor_config.json` - Processor configuration
- `tokenizer_config.json` - Tokenizer configuration
- `tokenizer.json` - Fast tokenizer
- `vocab.json` - Vocabulary
- `merges.txt` - BPE merges
- Custom Python modules:
  - `modeling_minicpmo.py`
  - `configuration_minicpm.py`
  - `processing_minicpmo.py`
  - `image_processing_minicpmv.py`
  - `tokenization_minicpmo_fast.py`
  - `modeling_navit_siglip.py`
  - `resampler.py`
  - `utils.py`

## Related Models

- Original model: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6)
- Previous test model: [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6)

## License

This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0).

## Citation

If you use this test model in your CI/CD or testing infrastructure, please reference:

```bibtex
@misc{tiny-minicpm-o-2_6,
  author = {Arash Kermani},
  title = {Tiny Random MiniCPM-o-2_6 for Testing},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}}
}
```

## Contact

For issues or questions about this test model, please open an issue in the [optimum-intel repository](https://github.com/huggingface/optimum-intel/issues).