openbmb
/

MiniCPM-o-4_5

@@ -35,12 +35,19 @@ A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Liv
   As a new feature, MiniCPM-o 4.5 can process real-time, continuous video and audio input streams simultaneously while generating concurrent text and speech output streams in an end-to-end fashion, without mutual blocking. This **allows MiniCPM-o 4.5 to see, listen, and speak simultaneously**, creating a fluid, real-time omnimodal conversation experience. Beyond reactive responses, the model can also perform **proactive interaction**, such as initiating reminders or comments based on its continuous understanding of the live scene.
 - 💪 **Strong OCR Capability, Efficiency and Others.**
-Advancing popular visual capabilities from MiniCPM-V series, MiniCPM-o 4.5 can process **high-resolution images** (up to 1.8 million pixels) and **high-FPS videos** (up to 10fps) in any aspect ratio efficiently. It achieves **state-of-the-art peformance for end-to-end English document parsing** on OmniDocBench, outperforming proprietary models such as Gemini-3 Flash and GPT-5, and specialized tools such as DeepSeek-OCR 2. It also features **trustworthy behaviors**, matching Gemini 2.5 Flash on MMHal-Bench, and supports **multilingual capabilities** on more than 30 languages.
 -  💫  **Easy Usage.**
-  MiniCPM-o 4.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/llama.cpp/minicpm-o4_5_llamacpp.md) and [Ollama](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/ollama/minicpm-o4_5_ollama.md) support for efficient CPU inference on local devices, (2) [int4](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-o4_5_awq_quantize.md) and [GGUF](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/gguf/minicpm-o4_5_gguf_quantize.md) format quantized models in 16 sizes, (3) [vLLM](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/vllm/minicpm-o4_5_vllm.md) and [SGLang](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/sglang/MiniCPM-o4_5_sglang.md) support for high-throughput and memory-efficient inference, (4) [FlagOS](#flagos) support for the unified multi-chip backend plugin, (5) fine-tuning on new domains and tasks with [LLaMA-Factory](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/llama-factory/finetune_llamafactory.md), and (6) online web demo on [server](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/gradio/README_o45.md). We also rollout a high-performing [llama.cpp-omni](https://github.com/tc-mb/llama.cpp-omni) inference framework together with a [WebRTC Demo](https://minicpm-omni.openbmb.cn/), which **enables the full-duplex multimodal live streaming experience on local devices** such as [PCs](https://github.com/tc-mb/llama.cpp-omni/blob/master/README.md) (e.g., on a MacBook).
 **Model Architecture.**
 <div align="center">
   <img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM-o/main/assets/minicpm-o-45-framework.png" width=100%>
@@ -60,7 +67,10 @@ Advancing popular visual capabilities from MiniCPM-V series, MiniCPM-o 4.5 can p
 <div align="center">
   <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpm_o_45_main_exp_table.png", width=90%>
 </div>
-Note: Scores marked with ∗ are from our evaluation; others are cited from referenced reports. n/a indicates that the model does not support the corresponding modality. All results are reported in instruct mode/variant.
 <details>
 <summary>Click to view visual understanding results.</summary>
@@ -653,9 +663,9 @@ Note: Scores marked with ∗ are from our evaluation; others are cited from refe
 </details>
 <details>
-<summary>Click to view omni simplex results.</summary>
-**Omni Simplex**
   <div align="center">
   <table style="margin: 0px auto;">
 <tr>
@@ -962,12 +972,17 @@ Note: Scores marked with ∗ are from our evaluation; others are cited from refe
   <a href="https://www.youtube.com/watch?v=6UzC-O1Q-1U"><img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmo4_5/video_play.png", width=70%></a>
 </div>
 ### Examples: 🎙️ Speech Conversation <!-- omit in toc -->
 > [!NOTE]
 > For detailed speech conversation examples, refer to [Audio Demo Page](https://openbmb.github.io/minicpm-o-4_5/)
-Simplex speech conversation with custom reference audio and character prompts.
 <details open>
 <summary>🚀 <b>Elon Musk</b> - Voice Roleplay (EN)</summary>
@@ -997,7 +1012,7 @@ Simplex speech conversation with custom reference audio and character prompts.
 </div>
-## Usage
 Inference using Hugging Face Transformers on NVIDIA GPUs. Please ensure `transformers==4.51.0` is installed, as other versions may have compatibility issues (under investigation). Requirements tested on Python 3.10:
@@ -1057,11 +1072,11 @@ model.eval().cuda()
 # Initialize TTS for audio output
 model.init_tts()
-# Convert simplex model to duplex mode
 duplex_model = model.as_duplex()
-# Convert duplex model back to simplex mode
-simplex_model = duplex_model.as_simplex(reset_session=True)
 ```
@@ -1158,7 +1173,7 @@ generate_duplex_video(
 ```
-### Simplex Omni Mode  <!-- omit in toc -->
 We provide two inference modes: chat and streaming.
 #### Chat Inference <!-- omit in toc -->
@@ -1299,10 +1314,10 @@ else:
 </details>
-### Simplex Realtime Speech Conversation Mode <!-- omit in toc -->
 <details>
-<summary>Click to show simplex mode realtime speech conversation API usage.</summary>
 First, make sure you have all dependencies, especially `"minicpmo-utils[all]>=1.0.5"`:
 ```bash
@@ -1427,11 +1442,12 @@ else:
 #### Speech Conversation as a Versatile and Vibe AI Assistant <!-- omit in toc -->
-Built on carefully designed post-training data and professional voice-actor recordings, `MiniCPM-o-4.5` can also function as an AI voice assistant. It delivers high-quality spoken interaction out of the box. It produces a sweet and expressive voice with natural prosody, including appropriate rhythm, stress, and pauses, giving a strong sense of liveliness in casual conversation. It also supports storytelling and narrative speech with coherent and engaging delivery. Moreover, it enables advanced voice instruction control. like emotional tone, word-level emphasis.
 <details>
 <summary>Click to show AI assistant conversation code.</summary>
 ```python
 import librosa
@@ -1465,11 +1481,11 @@ sys_msg = {
 #### General Speech Conversation with Custom Voice and Custom System Profile <!-- omit in toc -->
-MiniCPM-o-4.5 can role-play as a specific character based on an audio prompt and text profile prompt. It mimics the character's voice and adopts their language style in text responses. It also follows profile defined in text profile. In this mode, MiniCPM-o-4.5 sounds **more natural and human-like**.
 <details>
 <summary>Click to show custom voice conversation code.</summary>
 ```python
 import librosa
@@ -1527,11 +1543,12 @@ sys_msg = {
 #### Zero-shot Text-to-speech (TTS) <!-- omit in toc -->
-`MiniCPM-o-4.5` supports zero-shot text-to-speech (TTS). In this mode, the model functions as a highly-natural TTS system that can replicate a reference voice.
 <details>
 <summary>Click to show TTS code.</summary>
 ```python
 import librosa
@@ -1580,11 +1597,11 @@ res = model.chat(
 #### Mimick <!-- omit in toc -->
-The `Mimick` task evaluates a model's end-to-end speech modeling capability. The model takes audio input, transcribes it, and reconstructs the original audio with high fidelity, preserving detailed acoustic, paralinguistic, and semantic information. Higher similarity between the reconstructed and original audio indicates stronger end-to-end speech modeling capability.
 <details>
 <summary>Click to show mimick code.</summary>
 ```python
 import librosa
@@ -1618,6 +1635,10 @@ res = model.chat(
 #### Addressing Various Audio Understanding Tasks <!-- omit in toc -->
 `MiniCPM-o-4.5` can also handle various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
 For audio-to-text tasks, you can use the following prompts:
@@ -1628,9 +1649,6 @@ For audio-to-text tasks, you can use the following prompts:
 - General Audio Caption: `Summarize the main content of the audio.`
 - Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
-<details>
-<summary>Click to show audio understanding code.</summary>
 ```python
 import librosa
@@ -1688,11 +1706,7 @@ image = Image.open("assets/fossil.png").convert("RGB")
 question = "What is in the image?"
 msgs = [{"role": "user", "content": [image, question]}]
-enable_thinking=False # If `enable_thinking=True`, the thinking mode is enabled.
-stream=False # If `stream=True`, return string generator
-## default max_slice_nums=9, set max_slice_nums=25 for pdf parse task
-res = model.chat(msgs=msgs, use_tts_template=False, enable_thinking=enable_thinking, stream=stream)
 print(res)
 ```
@@ -1827,6 +1841,36 @@ msgs = [
 </details>
 ## FlagOS
 <details>
 <summary>Click to show FlagOS Usage details.</summary>
@@ -1916,10 +1960,17 @@ FlagRelease is a platform developed by the FlagOS team for automatic migration,
 </details>
 ## MiniCPM-V & o Cookbook
-Discover comprehensive, ready-to-deploy solutions for the MiniCPM-V and MiniCPM-o model series in our structured [cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook), which empowers developers to rapidly implement multimodal AI applications with integrated vision, speech, and live-streaming capabilities. Key features include:
 **Easy Usage Documentation**
@@ -1930,8 +1981,8 @@ All features are displayed at a glance, making it easy for you to quickly find e
 We support a wide range of users, from individuals to enterprises and researchers.
-* **Individuals**: Enjoy effortless inference using [Ollama](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/ollama/minicpm-v4_ollama.md) and [Llama.cpp](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/llama.cpp/minicpm-v4_llamacpp.md) with minimal setup.
-* **Enterprises**: Achieve high-throughput, scalable performance with [vLLM](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/vllm/minicpm-v4_vllm.md) and [SGLang](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/sglang/MiniCPM-v4_sglang.md).
 * **Researchers**: Leverage advanced frameworks including [Transformers](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_full.md), [LLaMA-Factory](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_llamafactory.md), [SWIFT](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/swift.md), and [Align-anything](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/align_anything.md) to enable flexible model development and cutting-edge experimentation.
 **Versatile Deployment Scenarios**
@@ -1947,8 +1998,8 @@ Our ecosystem delivers optimal solution for a variety of hardware environments a
 * The MiniCPM-o/V model weights and code are open-sourced under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM-V/blob/main/LICENSE) license.
 #### Statement
-* As an LMM, MiniCPM-o 4.5 generates contents by learning a large amount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o 4.5 does not represent the views and positions of the model developers
-* We will not be liable for any problems arising from the use of the MinCPM-o models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
 ## Key Techniques and Other Multimodal Projects <!-- omit in toc -->

   As a new feature, MiniCPM-o 4.5 can process real-time, continuous video and audio input streams simultaneously while generating concurrent text and speech output streams in an end-to-end fashion, without mutual blocking. This **allows MiniCPM-o 4.5 to see, listen, and speak simultaneously**, creating a fluid, real-time omnimodal conversation experience. Beyond reactive responses, the model can also perform **proactive interaction**, such as initiating reminders or comments based on its continuous understanding of the live scene.
 - 💪 **Strong OCR Capability, Efficiency and Others.**
+Advancing popular visual capabilities from MiniCPM-V series, MiniCPM-o 4.5 can process **high-resolution images** (up to 1.8 million pixels) and **high-FPS videos** (up to 10fps) in any aspect ratio efficiently. It achieves **state-of-the-art performance for end-to-end English document parsing** on OmniDocBench, outperforming proprietary models such as Gemini-3 Flash and GPT-5, and specialized tools such as DeepSeek-OCR 2. It also features **trustworthy behaviors**, matching Gemini 2.5 Flash on MMHal-Bench, and supports **multilingual capabilities** on more than 30 languages.
 -  💫  **Easy Usage.**
+  MiniCPM-o 4.5 can be easily used in various ways:  **Basic usage, recommended for 100% precision:** PyTorch inference with Nvidia GPU. **Other end-side adaptation** includes (1) llama.cpp and Ollama support for efficient CPU inference on local devices, (2) int4 and GGUF format quantized models in 16 sizes, (3) vLLM and SGLang support for high-throughput and memory-efficient inference, (4) FlagOS support for the unified multi-chip backend plugin. **We also open-sourced web demos** on which **enables the full-duplex multimodal live streaming experience on local devices** such as GPUs, PCs (e.g., on a MacBook).
 **Model Architecture.**
+- **End-to-end Omni-modal Architecture.** The modality encoders/decoders and LLM are densely connected via hidden states in an end-to-end fashion. This enables better information flow and control, and also facilitates full exploitation of rich multimodal knowledge during training.
+- **Full-Duplex Omni-modal Live Streaming Mechanism.** (1) We turn the offline modality encoder/decoders into online and full-duplex ones for streaming inputs/outputs. The speech token decoder models text and speech tokens in an interleaved fashion to support full-duplex speech generation (i.e., sync timely with new input). This also facilitates more stable long speech generation (e.g., > 1min).
+(2) **We sync all the input and output streams on timeline in milliseconds**, which are jointly modeled by a time-division multiplexing (TDM) mechanism for omni-modality streaming processing in the LLM backbone. It divides parallel omni-modality streams into sequential info groups within small periodic time slices.
+- **Proactive Interaction Mechanism.** The LLM continuously monitors the input video and audio streams, and decides at a frequency of 1Hz to speak or not. This high decision-making frequency together with full-duplex nature are curcial to enable the proactive interaction capability.
+- **Configurable Speech Modeling Design.** We inherent the multimodal system prompt design of MiniCPM-o 2.6, which includes a traditional text system prompt, and a new audio system prompt to determine the assistant voice. This enables cloning new voices and role play in inference time for speech conversation.
 <div align="center">
   <img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM-o/main/assets/minicpm-o-45-framework.png" width=100%>
 <div align="center">
   <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpm_o_45_main_exp_table.png", width=90%>
 </div>
+<strong>Note</strong>: Scores marked with ∗ are from our evaluation; others are cited from referenced reports. n/a indicates that the model does not support the corresponding modality. All results are reported in instruct mode/variant.
+&emsp;
+<br>
 <details>
 <summary>Click to view visual understanding results.</summary>
 </details>
 <details>
+<summary>Click to view omni half-duplex results.</summary>
+**Omni Half-Duplex**
   <div align="center">
   <table style="margin: 0px auto;">
 <tr>
   <a href="https://www.youtube.com/watch?v=6UzC-O1Q-1U"><img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmo4_5/video_play.png", width=70%></a>
 </div>
+### Examples: Omnimodal Full-Duplex Conversation <!-- omit in toc -->
+> [!NOTE]
+> For detailed speech conversation examples, refer to [Omni Full-Duplex Casebook](https://openbmb.github.io/minicpm-o-4_5-omni/)
 ### Examples: 🎙️ Speech Conversation <!-- omit in toc -->
 > [!NOTE]
 > For detailed speech conversation examples, refer to [Audio Demo Page](https://openbmb.github.io/minicpm-o-4_5/)
+Half-duplex speech conversation with custom reference audio and character prompts.
 <details open>
 <summary>🚀 <b>Elon Musk</b> - Voice Roleplay (EN)</summary>
 </div>
+## Offline Inference Examples with Transformers
 Inference using Hugging Face Transformers on NVIDIA GPUs. Please ensure `transformers==4.51.0` is installed, as other versions may have compatibility issues (under investigation). Requirements tested on Python 3.10:
 # Initialize TTS for audio output
 model.init_tts()
+# Convert half-duplex model to duplex mode
 duplex_model = model.as_duplex()
+# Convert duplex model back to half-duplex mode
+model = duplex_model.as_simplex(reset_session=True)
 ```
 ```
+### Half-Duplex Omni Mode  <!-- omit in toc -->
 We provide two inference modes: chat and streaming.
 #### Chat Inference <!-- omit in toc -->
 </details>
+### Half-Duplex Realtime Speech Conversation Mode <!-- omit in toc -->
 <details>
+<summary>Click to show half-duplex mode realtime speech conversation API usage.</summary>
 First, make sure you have all dependencies, especially `"minicpmo-utils[all]>=1.0.5"`:
 ```bash
 #### Speech Conversation as a Versatile and Vibe AI Assistant <!-- omit in toc -->
 <details>
 <summary>Click to show AI assistant conversation code.</summary>
+Built on carefully designed post-training data and professional voice-actor recordings, `MiniCPM-o-4.5` can also function as an AI voice assistant. It delivers high-quality spoken interaction out of the box. It produces a sweet and expressive voice with natural prosody, including appropriate rhythm, stress, and pauses, giving a strong sense of liveliness in casual conversation. It also supports storytelling and narrative speech with coherent and engaging delivery. Moreover, it enables advanced voice instruction control. like emotional tone, word-level emphasis.
 ```python
 import librosa
 #### General Speech Conversation with Custom Voice and Custom System Profile <!-- omit in toc -->
 <details>
 <summary>Click to show custom voice conversation code.</summary>
+MiniCPM-o-4.5 can role-play as a specific character based on an audio prompt and text profile prompt. It mimics the character's voice and adopts their language style in text responses. It also follows profile defined in text profile. In this mode, MiniCPM-o-4.5 sounds **more natural and human-like**.
 ```python
 import librosa
 #### Zero-shot Text-to-speech (TTS) <!-- omit in toc -->
 <details>
 <summary>Click to show TTS code.</summary>
+`MiniCPM-o-4.5` supports zero-shot text-to-speech (TTS). In this mode, the model functions as a highly-natural TTS system that can replicate a reference voice.
 ```python
 import librosa
 #### Mimick <!-- omit in toc -->
 <details>
 <summary>Click to show mimick code.</summary>
+The `Mimick` task evaluates a model's end-to-end speech modeling capability. The model takes audio input, transcribes it, and reconstructs the original audio with high fidelity, preserving detailed acoustic, paralinguistic, and semantic information. Higher similarity between the reconstructed and original audio indicates stronger end-to-end speech modeling capability.
 ```python
 import librosa
 #### Addressing Various Audio Understanding Tasks <!-- omit in toc -->
+<details>
+<summary>Click to show audio understanding code.</summary>
 `MiniCPM-o-4.5` can also handle various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
 For audio-to-text tasks, you can use the following prompts:
 - General Audio Caption: `Summarize the main content of the audio.`
 - Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
 ```python
 import librosa
 question = "What is in the image?"
 msgs = [{"role": "user", "content": [image, question]}]
+res = model.chat(msgs=msgs, use_tts_template=False)
 print(res)
 ```
 </details>
+## Deploy a Realtime Web Demo on Your Own Device
+### Option A (Recommended): **PyTorch Inference with Nvidia GPU** for 100% model precision with no deductions in performance.
+We provide a PyTorch-based [simplified yet full-functional web demo](https://github.com/OpenBMB/minicpm-o-4_5-pytorch-simple-demo) which could boost the model inference performance, supports:
+- full-duplex omnimodal live streaming
+- full-duplex speech live streaming
+- half-duplex speech live streaming (under development)
+- turn-based chat conversation
+- customizable system prompts
+- customizable reference audio
+- simple and readable codebase for continual development
+- serve as API backend for third-party applications
+Requirements:
+- Nvidia GPU with at least 28GB GPU memory. *We are working on optimizing the model for lower GPU memory usage.*
+### Option B: **llama.cpp-omni** for end-side inference with PCs like Mac and low-resource devices.
+With a fully C++ implementation of `MiniCPM-o 4.5` and quantized weights, `llama.cpp-omni` supports:
+- half-duplex speech realtime conversation
+- full-duplex omnimodal live streaming
+We provide [ready-to-run guidance](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/WebRTC_Demo/README.md) to access the low-latency full-duplex communication directly on your own Mac using our new official Docker image.
+Requirements:
+- For half-duplex speech realtime conversation: Apple M3/M4/M5 chip with at least 16GB RAM or low-resource Nvidia GPU with at least 12GB GPU memory
+- For full-duplex omnimodal live streaming: Apple M4 Max chip with at least 24GB RAM or low-resource Nvidia GPU with at least 12GB GPU memory
 ## FlagOS
 <details>
 <summary>Click to show FlagOS Usage details.</summary>
 </details>
+### vLLM, SGLang, llama.cpp, Ollama
+We support inference with vLLM, SGLang, llama.cpp and Ollama. Refer to our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-Cookbook) for more details.
+### LLaMA-Factory, SWIFT
+We support fine-tuning with LLaMA-Factory, SWIFT. Refer to our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-Cookbook) for more details.
 ## MiniCPM-V & o Cookbook
+Discover comprehensive, ready-to-deploy solutions for the MiniCPM-V and MiniCPM-o model series in our structured [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook), which empowers developers to rapidly implement multimodal AI applications with integrated vision, speech, and live-streaming capabilities. Key features include:
 **Easy Usage Documentation**
 We support a wide range of users, from individuals to enterprises and researchers.
+* **Individuals**: Enjoy effortless inference using Ollama ([V4](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/ollama/minicpm-v4_ollama.md), [o4.5](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/ollama/minicpm-o4_5_ollama.md)) and Llama.cpp ([V4](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/llama.cpp/minicpm-v4_llamacpp.md), [o4.5](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/llama.cpp/minicpm-o4_5_llamacpp.md)) with minimal setup.
+* **Enterprises**: Achieve high-throughput, scalable performance with vLLM ([V4](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/vllm/minicpm-v4_vllm.md), [o4.5](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/vllm/minicpm-o4_5_vllm.md)) and SGLang ([V4](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/sglang/MiniCPM-v4_sglang.md), [o4.5](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/sglang/MiniCPM-o4_5_sglang.md)).
 * **Researchers**: Leverage advanced frameworks including [Transformers](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_full.md), [LLaMA-Factory](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_llamafactory.md), [SWIFT](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/swift.md), and [Align-anything](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/align_anything.md) to enable flexible model development and cutting-edge experimentation.
 **Versatile Deployment Scenarios**
 * The MiniCPM-o/V model weights and code are open-sourced under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM-V/blob/main/LICENSE) license.
 #### Statement
+* As MLLMs, MiniCPM-o/V models generate content by learning a large number of multimodal corpora, but they cannot comprehend, express personal opinions, or make value judgements. Anything generated by MiniCPM-o/V models does not represent the views and positions of the model developers
+* We will not be liable for any problems arising from the use of MiniCPM-o/V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination, or misuse of the model.
 ## Key Techniques and Other Multimodal Projects <!-- omit in toc -->