Any-to-Any
Transformers
Safetensors
multilingual
minicpmo
feature-extraction
minicpm-o
omni
vision
ocr
multi-image
video
custom_code
audio
speech
voice cloning
live Streaming
realtime speech conversation
asr
tts
Instructions to use openbmb/MiniCPM-o-2_6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM-o-2_6 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/MiniCPM-o-2_6", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Handle truncated image boundaries in `_convert` to avoid tensor size mismatch
Browse files## Summary
This PR proposes a change in `_convert` to handle cases where truncation (`max_inp_length`)
could leave an unmatched `<im_start>` (or `<slice_start>`) token without its closing `<im_end>` / `<slice_end>`.
When this happens, `image_start_idx` and `image_end_idx` have different lengths,
causing a runtime error in line 274:
```
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size x but got size x-1 for tensor number 1 in the list.
```
## Changes
- Changed `valid_image_nums` from `max(len(start), len(end))` to `min(len(start), len(end))`
→ only keep valid start–end pairs
- processing_minicpmo.py +1 -1
processing_minicpmo.py
CHANGED
|
@@ -269,7 +269,7 @@ class MiniCPMOProcessor(ProcessorMixin):
|
|
| 269 |
image_start_idx += 1
|
| 270 |
image_end_idx = torch.where(end_cond)[0]
|
| 271 |
|
| 272 |
-
valid_image_nums =
|
| 273 |
|
| 274 |
image_bounds = torch.hstack(
|
| 275 |
[
|
|
|
|
| 269 |
image_start_idx += 1
|
| 270 |
image_end_idx = torch.where(end_cond)[0]
|
| 271 |
|
| 272 |
+
valid_image_nums = min(len(image_start_idx), len(image_end_idx))
|
| 273 |
|
| 274 |
image_bounds = torch.hstack(
|
| 275 |
[
|