Image-Text-to-Text
MLX
Safetensors
fine-tuned
vision
multimodal
reasoning
mcp
ios
chapper
ios-client
tools
tool-use
lm-studio
prevolut
Instructions to use Prevolut/Chapper-MCP-Vision-Slim-IT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Prevolut/Chapper-MCP-Vision-Slim-IT with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("Prevolut/Chapper-MCP-Vision-Slim-IT") config = load_config("Prevolut/Chapper-MCP-Vision-Slim-IT") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
base_model: google/gemma-4-E2B-it
|
|
@@ -12,9 +13,10 @@ tags:
|
|
| 12 |
- chapper
|
| 13 |
- ios-client
|
| 14 |
- tools
|
|
|
|
| 15 |
- lm-studio
|
| 16 |
- prevolut
|
| 17 |
-
pipeline_tag: text-
|
| 18 |
language:
|
| 19 |
- multilingual
|
| 20 |
- en
|
|
@@ -39,15 +41,24 @@ datasets:
|
|
| 39 |
|
| 40 |
Developed by **Prevolut Ltd**, this model serves as the local intelligence engine powering **[Chapper – AI & LM Studio Client](https://apps.apple.com/de/app/chapper-ai-lm-studio-client/id6760984679)**, a native iOS application designed for on-device or server, privacy-first LLM inference.
|
| 41 |
|
| 42 |
-
While purposefully built to drive the Chapper ecosystem, its
|
| 43 |
|
| 44 |
-
##
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
##
|
| 48 |
-
Running fully autonomous, vision-capable agents on mobile hardware requires extreme efficiency. We needed a model that understands complex UI screenshots, follows strict JSON formatting rules, and retains general reasoning—all without sacrificing device performance or battery life.
|
| 49 |
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
## 📚 Training Data & Mix
|
| 53 |
To achieve the perfect balance between strict syntax discipline and dynamic logic, we curated a massive, multi-tiered dataset:
|
|
@@ -58,7 +69,9 @@ To achieve the perfect balance between strict syntax discipline and dynamic logi
|
|
| 58 |
|
| 59 |
## ⚡️ Inference & Prompt Format
|
| 60 |
|
| 61 |
-
This model strictly follows the standard Gemma IT prompt template. To utilize its vision capabilities and MCP formatting, ensure your inputs are structured correctly
|
|
|
|
|
|
|
| 62 |
|
| 63 |
```xml
|
| 64 |
<start_of_turn>user
|
|
@@ -66,11 +79,11 @@ Analyze this UI screenshot and format the action as a valid Chapper MCP request.
|
|
| 66 |
<start_of_turn>assistant
|
| 67 |
```
|
| 68 |
|
| 69 |
-
##
|
| 70 |
|
| 71 |
Designed for edge inference, this model shines on Apple Silicon (macOS/iOS) and within fast local environments.
|
| 72 |
|
| 73 |
-
###
|
| 74 |
We highly recommend running this via Apple's `mlx-swift` / `mlx-vlm` libraries for direct Neural Engine & GPU acceleration on iPhones and Macs:
|
| 75 |
|
| 76 |
```swift
|
|
@@ -91,4 +104,9 @@ print(result.text) // Outputs perfect <mcp-request> syntax!
|
|
| 91 |
For `.gguf` variants, the model can be natively loaded into LM Studio. **Crucial:** To enable vision capabilities, you must load the accompanying `-mmproj.gguf` Vision Adapter in the hardware settings alongside the main model.
|
| 92 |
|
| 93 |
## ⚖️ License
|
| 94 |
-
This model is released under the **Apache 2.0 License**, inheriting the open and permissive nature of its base architecture.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
```markdown
|
| 2 |
---
|
| 3 |
license: apache-2.0
|
| 4 |
base_model: google/gemma-4-E2B-it
|
|
|
|
| 13 |
- chapper
|
| 14 |
- ios-client
|
| 15 |
- tools
|
| 16 |
+
- tool-use
|
| 17 |
- lm-studio
|
| 18 |
- prevolut
|
| 19 |
+
pipeline_tag: image-text-to-text
|
| 20 |
language:
|
| 21 |
- multilingual
|
| 22 |
- en
|
|
|
|
| 41 |
|
| 42 |
Developed by **Prevolut Ltd**, this model serves as the local intelligence engine powering **[Chapper – AI & LM Studio Client](https://apps.apple.com/de/app/chapper-ai-lm-studio-client/id6760984679)**, a native iOS application designed for on-device or server, privacy-first LLM inference.
|
| 43 |
|
| 44 |
+
We engineered this model to bridge the gap between lightweight edge-computing and advanced structural reasoning. While purposefully built to drive the Chapper ecosystem, its strict adherence to JSON formatting and robust logical foundation makes it a highly capable agent for any general-purpose application requiring complex tool orchestration and multimodal analysis.
|
| 45 |
|
| 46 |
+
## 🎯 Key Features & Enhancements
|
| 47 |
+
|
| 48 |
+
* **Socratic Reasoning Engine:** Instead of guessing answers, the model is trained to break down complex, multi-stage system problems step-by-step, running internal plausibility checks before outputting the final result.
|
| 49 |
+
* **Format & Syntax Discipline:** Highly disciplined in maintaining strict output structures. It isolates data cleanly and is exceptionally stable at generating pure JSON blocks without conversational clutter.
|
| 50 |
+
* **MCP & Tool Orchestration Ready:** Due to its strict formatting adherence, this model is an ideal candidate for serving as a local agent interacting with the Model Context Protocol (MCP), executing API calls, and managing local system states.
|
| 51 |
+
* **Multimodal & Vision Capable:** Flawlessly reads, analyzes, and translates UI screenshots, diagrams, and visual inputs directly into actionable code or structured tool payloads.
|
| 52 |
+
* **Edge Optimized:** Achieves desktop-grade tool-use natively on mobile edge devices using advanced quantization techniques (~6.8 bits with 4-bit text layers and 16-bit vision layers via MLX).
|
| 53 |
|
| 54 |
+
## 💻 Intended Use Cases
|
|
|
|
| 55 |
|
| 56 |
+
* **Local AI Agents:** Powering privacy-first, on-device assistants on iOS, iPadOS, and macOS.
|
| 57 |
+
* **System Orchestration:** Translating natural language and visual inputs into structured JSON payloads for tool execution.
|
| 58 |
+
* **Complex Logic Tasks:** Solving dynamic UI challenges, mathematical deductions, and multi-variable logic puzzles on the fly.
|
| 59 |
+
|
| 60 |
+
## 🌍 Multilingual Capabilities
|
| 61 |
+
Inheriting the massive linguistic foundation of its base architecture, this model is fluent in **over 100+ languages**. Whether processing inputs or generating complex JSON structures, it maintains high logical fidelity across English, German, French, Spanish, Italian, Dutch, Mandarin, Japanese, Korean, and many more.
|
| 62 |
|
| 63 |
## 📚 Training Data & Mix
|
| 64 |
To achieve the perfect balance between strict syntax discipline and dynamic logic, we curated a massive, multi-tiered dataset:
|
|
|
|
| 69 |
|
| 70 |
## ⚡️ Inference & Prompt Format
|
| 71 |
|
| 72 |
+
This model strictly follows the standard Gemma IT prompt template. To utilize its vision capabilities and MCP formatting, ensure your inputs are structured correctly.
|
| 73 |
+
|
| 74 |
+
To leverage the model's structural discipline for tool calls, we recommend enforcing rules in your system prompts (e.g., *"You are a local system agent. If you need to use a tool, output ONLY a valid JSON block. Do not add any conversational text before or after the JSON."*).
|
| 75 |
|
| 76 |
```xml
|
| 77 |
<start_of_turn>user
|
|
|
|
| 79 |
<start_of_turn>assistant
|
| 80 |
```
|
| 81 |
|
| 82 |
+
## 🛠️ Usage
|
| 83 |
|
| 84 |
Designed for edge inference, this model shines on Apple Silicon (macOS/iOS) and within fast local environments.
|
| 85 |
|
| 86 |
+
### 📱 Natively on iOS via Apple MLX
|
| 87 |
We highly recommend running this via Apple's `mlx-swift` / `mlx-vlm` libraries for direct Neural Engine & GPU acceleration on iPhones and Macs:
|
| 88 |
|
| 89 |
```swift
|
|
|
|
| 104 |
For `.gguf` variants, the model can be natively loaded into LM Studio. **Crucial:** To enable vision capabilities, you must load the accompanying `-mmproj.gguf` Vision Adapter in the hardware settings alongside the main model.
|
| 105 |
|
| 106 |
## ⚖️ License
|
| 107 |
+
This model is released under the **Apache 2.0 License**, inheriting the open and permissive nature of its base architecture.
|
| 108 |
+
|
| 109 |
+
---
|
| 110 |
+
*Developed with a focus on local AI efficiency by **Prevolut Ltd***
|
| 111 |
+
|
| 112 |
+
```
|