Image-Text-to-Text
MLX
Safetensors
fine-tuned
vision
multimodal
reasoning
mcp
ios
chapper
ios-client
tools
tool-use
lm-studio
prevolut
Instructions to use Prevolut/Chapper-MCP-Vision-Slim-IT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Prevolut/Chapper-MCP-Vision-Slim-IT with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("Prevolut/Chapper-MCP-Vision-Slim-IT") config = load_config("Prevolut/Chapper-MCP-Vision-Slim-IT") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
| license: apache-2.0 | |
| base_model: google/gemma-4-E2B-it | |
| tags: | |
| - fine-tuned | |
| - vision | |
| - multimodal | |
| - reasoning | |
| - mcp | |
| - mlx | |
| - ios | |
| - chapper | |
| - ios-client | |
| - tools | |
| - tool-use | |
| - lm-studio | |
| - prevolut | |
| pipeline_tag: image-text-to-text | |
| language: | |
| - multilingual | |
| - en | |
| - de | |
| - fr | |
| - it | |
| - es | |
| - nl | |
| - pt | |
| - zh | |
| - ja | |
| - ko | |
| datasets: | |
| - openai/gsm8k | |
| - prevolut/chapper-mcp-internal | |
| - codealpaca | |
| # ποΈβ‘οΈ Chapper-MCP-Vision-Slim-IT | |
| **Chapper-MCP-Vision-Slim-IT** is a highly optimized, multimodal Large Language Model engineered for the edge. Built upon the vision-capable Gemma architecture, this model is strictly fine-tuned for structured **Model Context Protocol (MCP)** outputs, deep Socratic reasoning, and flawless image-to-action analysis. | |
| Developed by **Prevolut Ltd**, this model serves as the local intelligence engine powering **[Chapper β AI & LM Studio Client](https://apps.apple.com/de/app/chapper-ai-lm-studio-client/id6760984679)**, a native iOS application designed for on-device or server, privacy-first LLM inference. | |
| We engineered this model to bridge the gap between lightweight edge-computing and advanced structural reasoning. While purposefully built to drive the Chapper ecosystem, its strict adherence to JSON formatting and robust logical foundation makes it a highly capable agent for any general-purpose application requiring complex tool orchestration and multimodal analysis. | |
| ## π― Key Features & Enhancements | |
| * **Socratic Reasoning Engine:** Instead of guessing answers, the model is trained to break down complex, multi-stage system problems step-by-step, running internal plausibility checks before outputting the final result. | |
| * **Format & Syntax Discipline:** Highly disciplined in maintaining strict output structures. It isolates data cleanly and is exceptionally stable at generating pure JSON blocks without conversational clutter. | |
| * **MCP & Tool Orchestration Ready:** Due to its strict formatting adherence, this model is an ideal candidate for serving as a local agent interacting with the Model Context Protocol (MCP), executing API calls, and managing local system states. | |
| * **Multimodal & Vision Capable:** Flawlessly reads, analyzes, and translates UI screenshots, diagrams, and visual inputs directly into actionable code or structured tool payloads. | |
| * **Edge Optimized:** Achieves desktop-grade tool-use natively on mobile edge devices using advanced quantization techniques (~6.8 bits with 4-bit text layers and 16-bit vision layers via MLX). | |
| ## π» Intended Use Cases | |
| * **Local AI Agents:** Powering privacy-first, on-device assistants on iOS, iPadOS, and macOS. | |
| * **System Orchestration:** Translating natural language and visual inputs into structured JSON payloads for tool execution. | |
| * **Complex Logic Tasks:** Solving dynamic UI challenges, mathematical deductions, and multi-variable logic puzzles on the fly. | |
| ## π Multilingual Capabilities | |
| Inheriting the massive linguistic foundation of its base architecture, this model is fluent in **over 100+ languages**. Whether processing inputs or generating complex JSON structures, it maintains high logical fidelity across English, German, French, Spanish, Italian, Dutch, Mandarin, Japanese, Korean, and many more. | |
| ## π Training Data & Mix | |
| To achieve the perfect balance between strict syntax discipline and dynamic logic, we curated a massive, multi-tiered dataset: | |
| 1. **The Prevolut Custom MCP Dataset:** The core of this model. We trained it on highly specific UI-to-action mappings, teaching the model to output flawless `<mcp-request>` JSON tags based on user intents and visual inputs. | |
| 2. **`openai/gsm8k`:** Included to maintain and sharpen the model's Socratic reasoning and mathematical deduction capabilities. | |
| 3. **Code Instruction Data (CodeAlpaca):** Added to preserve deep structural thinking and coding logic, ensuring the model understands complex backend tasks. | |
| ## β‘οΈ Inference & Prompt Format | |
| This model strictly follows the standard Gemma IT prompt template. To utilize its vision capabilities and MCP formatting, ensure your inputs are structured correctly. | |
| To leverage the model's structural discipline for tool calls, we recommend enforcing rules in your system prompts (e.g., *"You are a local system agent. If you need to use a tool, output ONLY a valid JSON block. Do not add any conversational text before or after the JSON."*). | |
| ```xml | |
| <start_of_turn>user | |
| Analyze this UI screenshot and format the action as a valid Chapper MCP request.<end_of_turn> | |
| <start_of_turn>assistant | |
| ``` | |
| ## π οΈ Usage | |
| Designed for edge inference, this model shines on Apple Silicon (macOS/iOS) and within fast local environments. | |
| ### π± Natively on iOS via Apple MLX | |
| We highly recommend running this via Apple's `mlx-swift` / `mlx-vlm` libraries for direct Neural Engine & GPU acceleration on iPhones and Macs: | |
| ```swift | |
| import MLX | |
| import MLXVLM | |
| let modelPath = Bundle.main.url(forResource: "chapper-ios-model", withExtension: nil)! | |
| let vlmModel = try await VLMModelFactory.shared.load(configuration: ModelConfiguration(directory: modelPath)) | |
| let prompt = "<start_of_turn>user\nExtract the requested action from this image.<end_of_turn>\n<start_of_turn>assistant\n" | |
| let input = UserInput(images: [.init(userImage)], prompt: prompt) | |
| let result = try await vlmModel.generate(input: input) | |
| print(result.text) // Outputs perfect <mcp-request> syntax! | |
| ``` | |
| ### π₯οΈ In LM Studio / llama.cpp | |
| For `.gguf` variants, the model can be natively loaded into LM Studio. **Crucial:** To enable vision capabilities, you must load the accompanying `-mmproj.gguf` Vision Adapter in the hardware settings alongside the main model. | |
| ## βοΈ License | |
| This model is released under the **Apache 2.0 License**, inheriting the open and permissive nature of its base architecture. | |
| --- | |
| *Developed with a focus on local AI efficiency by **Prevolut Ltd*** |