1

MiniCPM-V-4.6-abliterated-MAX

MiniCPM-V-4.6-abliterated-MAX is an abliterated evolution built on top of openbmb/MiniCPM-V-4.6. This model applies advanced refusal direction analysis and ablation-based optimization strategies to reduce internal refusal behaviors while preserving the multimodal reasoning and instruction-following strengths of the original architecture. The result is a highly capable and ultra-efficient multimodal language model optimized for image, video, and text understanding with improved instruction adherence.

This model is intended for research and learning purposes only. It reduces internal refusal behaviors, and any content generated by it is used at the user’s own risk. The authors and hosting page disclaim any liability for outputs produced by this model. Users are responsible for ensuring safe, ethical, and lawful usage.

Evals

2

.eval_results: harm_bench_score.yaml

The evaluation was conducted using 2,000 random harmful test prompts to measure the refusal behavior of the language model. The self-reported evaluations provided here are intended only to give an overview of the model. Scores may vary depending on the benchmark and the evaluation strategy used.

Key Highlights

  • Advanced Refusal Direction Analysis Uses targeted activation analysis to identify and mitigate refusal directions within the model’s latent space.

  • Abliterated MAX Optimization Fine-tuned to significantly reduce refusal patterns while maintaining coherent, detailed, and instruction-aligned outputs.

  • Efficient Multimodal Architecture Built on openbmb/MiniCPM-V-4.6, combining SigLIP2-400M vision encoding with Qwen3.5-0.8B language capabilities for compact yet powerful multimodal understanding.

  • Image & Video Understanding Supports advanced reasoning across text, images, and videos with efficient deployment on edge and mobile-class hardware.

  • 262K Long Context Support Optimized for extremely long multimodal contexts across text, image, and video inputs.

  • Improved Instruction Adherence Designed to follow complex prompts with fewer unnecessary refusals while retaining strong conversational capabilities.

  • High-Efficiency Deployment Suitable for local inference, lightweight multimodal applications, and research experimentation on consumer-grade GPUs.

Quick Start with Transformers

pip install transformers==5.8.0 gradio==6.14.0
import gc
import time
from threading import Thread

import gradio as gr
import torch
from PIL import Image

from transformers import (
    MiniCPMV4_6ForConditionalGeneration,
    AutoProcessor,
    TextIteratorStreamer,
)

MAX_MAX_NEW_TOKENS = 4096
DEFAULT_MAX_NEW_TOKENS = 1024
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

MODEL_ID = "prithivMLmods/MiniCPM-V-4.6-abliterated-MAX"
processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
model = MiniCPMV4_6ForConditionalGeneration.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to(device).eval()


def generate(
    image: Image.Image,
    text: str,
    max_new_tokens: int = DEFAULT_MAX_NEW_TOKENS,
    temperature: float = 0.6,
    top_p: float = 0.9,
    top_k: int = 50,
    repetition_penalty: float = 1.2,
):
    if image is None:
        yield "[ERROR] Please upload an image."
        return
    if not text or not text.strip():
        yield "[ERROR] Please enter your instruction."
        return

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image"},
                {"type": "text", "text": text},
            ],
        }
    ]
    prompt_full = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

    inputs = processor(
        text=[prompt_full],
        images=[image],
        return_tensors="pt",
        padding=True,
    ).to(device)

    streamer = TextIteratorStreamer(
        processor.tokenizer if hasattr(processor, "tokenizer") else processor,
        skip_prompt=True,
        skip_special_tokens=True,
    )

    generation_error = {"error": None}
    generation_kwargs = {
        **inputs,
        "streamer": streamer,
        "max_new_tokens": int(max_new_tokens),
        "do_sample": True,
        "temperature": float(temperature),
        "top_p": float(top_p),
        "top_k": int(top_k),
        "repetition_penalty": float(repetition_penalty),
    }

    def _run():
        try:
            model.generate(**generation_kwargs)
        except Exception as e:
            generation_error["error"] = e
            try:
                streamer.end()
            except Exception:
                pass

    thread = Thread(target=_run, daemon=True)
    thread.start()

    buffer = ""
    for new_text in streamer:
        buffer += new_text
        time.sleep(0.01)
        yield buffer

    thread.join(timeout=1.0)

    if generation_error["error"] is not None:
        err = f"[ERROR] {str(generation_error['error'])}"
        yield (buffer + "\n\n" + err) if buffer.strip() else err
        return

    if not buffer.strip():
        yield "[ERROR] No output was generated."

    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()


def run_inference(
    image,
    text,
    max_new_tokens,
    temperature,
    top_p,
    top_k,
    repetition_penalty,
):
    yield from generate(
        image=image,
        text=text,
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        repetition_penalty=repetition_penalty,
    )


with gr.Blocks(title="MiniCPM-V-4.6-abliterated-MAX") as demo:
    gr.Markdown(
        "# MiniCPM-V-4.6-abliterated-MAX\n"
        "Upload an image and enter your instruction to run multimodal inference."
    )

    with gr.Row():
        with gr.Column(scale=1):
            image_input = gr.Image(type="pil", label="Input Image")
            text_input = gr.Textbox(
                label="Instruction",
                placeholder="e.g., Describe the image, perform OCR, solve the problem...",
                lines=4,
            )
            run_btn = gr.Button("Run Inference", variant="primary")

            with gr.Accordion("Advanced Settings", open=False):
                max_new_tokens = gr.Slider(
                    minimum=1,
                    maximum=MAX_MAX_NEW_TOKENS,
                    step=1,
                    value=DEFAULT_MAX_NEW_TOKENS,
                    label="Max New Tokens",
                )
                temperature = gr.Slider(
                    minimum=0.1,
                    maximum=4.0,
                    step=0.1,
                    value=0.6,
                    label="Temperature",
                )
                top_p = gr.Slider(
                    minimum=0.05,
                    maximum=1.0,
                    step=0.05,
                    value=0.9,
                    label="Top-p",
                )
                top_k = gr.Slider(
                    minimum=1,
                    maximum=1000,
                    step=1,
                    value=50,
                    label="Top-k",
                )
                repetition_penalty = gr.Slider(
                    minimum=1.0,
                    maximum=2.0,
                    step=0.05,
                    value=1.2,
                    label="Repetition Penalty",
                )

        with gr.Column(scale=1):
            output = gr.Textbox(
                label="Output",
                lines=20,
                placeholder="Output will appear here...",
            )

    run_btn.click(
        fn=run_inference,
        inputs=[
            image_input,
            text_input,
            max_new_tokens,
            temperature,
            top_p,
            top_k,
            repetition_penalty,
        ],
        outputs=[output],
    )

if __name__ == "__main__":
    demo.queue(max_size=10).launch(show_error=True)

Base Model Information

openbmb/MiniCPM-V-4.6 is a 1.3B-parameter dense multimodal language model developed by OpenBMB (Tsinghua NLP + ModelBest). It is built using SigLIP2-400M for visual encoding and Qwen3.5-0.8B as the language backbone, optimized for efficient multimodal understanding on edge and mobile hardware while supporting long-context reasoning across text, image, and video modalities.

Intended Use

  • Alignment & Refusal Research Studying refusal behaviors and activation-level alignment modifications in multimodal systems.

  • Multimodal Red-Teaming Experiments Evaluating robustness across adversarial image, video, and text prompts.

  • Edge & Local AI Deployment Running compact multimodal AI systems efficiently on consumer hardware and edge devices.

  • Research Prototyping Experimentation with efficient multimodal transformer architectures and alignment techniques.

Limitations & Risks

Important Note: This model intentionally reduces built-in refusal mechanisms.

  • Sensitive Output Possibility The model may generate controversial, explicit, or unsafe responses depending on prompts and multimodal inputs.

  • User Responsibility Outputs must be handled responsibly and within legal and ethical boundaries.

  • Potential Hallucinations Multimodal reasoning may occasionally produce inaccurate or fabricated interpretations.

  • Deployment Considerations While optimized for efficiency, high-resolution image and video inference may still require substantial VRAM and optimized runtimes depending on workload complexity.

Downloads last month
84
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/MiniCPM-V-4.6-abliterated-MAX

Finetuned
(5)
this model
Quantizations
3 models

Dataset used to train prithivMLmods/MiniCPM-V-4.6-abliterated-MAX

Collection including prithivMLmods/MiniCPM-V-4.6-abliterated-MAX