Instructions to use mindlab-research/Macaron-V1-Preview-749B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mindlab-research/Macaron-V1-Preview-749B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mindlab-research/Macaron-V1-Preview-749B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("mindlab-research/Macaron-V1-Preview-749B") model = AutoModelForMultimodalLM.from_pretrained("mindlab-research/Macaron-V1-Preview-749B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mindlab-research/Macaron-V1-Preview-749B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mindlab-research/Macaron-V1-Preview-749B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mindlab-research/Macaron-V1-Preview-749B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mindlab-research/Macaron-V1-Preview-749B
- SGLang
How to use mindlab-research/Macaron-V1-Preview-749B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mindlab-research/Macaron-V1-Preview-749B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mindlab-research/Macaron-V1-Preview-749B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mindlab-research/Macaron-V1-Preview-749B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mindlab-research/Macaron-V1-Preview-749B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mindlab-research/Macaron-V1-Preview-749B with Docker Model Runner:
docker model run hf.co/mindlab-research/Macaron-V1-Preview-749B
Macaron-V1-Preview-749B
Macaron-V1-Preview-749B is a 749B-class Mixture-of-LoRA personal-agent model from MindLab Research, post-trained from GLM-5.1 with MinT. It combines a 744B base model with five specialist LoRA adapters and a router-driven serving design for multi-turn personal-life assistance, tool-grounded planning, coding and terminal workflows, and protocol-grounded Generative UI.
[📖 Release Blog] [💻 Harness Code] [🚀 Live Preview]
Highlights
- 749B-class Mixture-of-LoRA preview model: 744B base + 5 specialist LoRAs.
- Built for personal-agent tasks where user intent, private state, tools, and world state change across turns.
- Uses an explicit router-tool design to route from the default adapter to specialist LoRAs.
- Covers personal planning, search/calendar/tool workflows, coding and terminal tasks, computer-agent workflows, and A2UI Generative UI.
- Ships as a single Hugging Face repository: base model files at root, LoRA adapters in
l0/throughl4/.
Model Overview
| Field | Value |
|---|---|
| Model name | Macaron-V1-Preview-749B |
| Organization | MindLab Research |
| Base model | GLM-5.1 |
| Architecture | Mixture-of-LoRA |
| Parameter footprint | 749B-class: 744B base + 5 x ~1B LoRA |
| Post-training system | MinT |
| Primary domain | Personal agents, tool-use agents, Generative UI |
| Release type | Preview |
| Checkpoint format | Single HF repo: base checkpoint at root; LoRAs under l0/-l4/ |
| Context length | 202,752 tokens, from config.json / tokenizer_config.json |
| Precision | bfloat16, from config.json |
| License | MIT; see License |
Repository Layout
The release is intentionally kept in one Hugging Face model repository:
.
|-- config.json
|-- generation_config.json
|-- model.safetensors.index.json
|-- model-00001-of-00282.safetensors
|-- ...
|-- model-00282-of-00282.safetensors
|-- tokenizer.json
|-- tokenizer_config.json
|-- l0/
| |-- adapter_config.json
| `-- adapter_model.safetensors
|-- l1/
|-- l2/
|-- l3/
`-- l4/
Adapter roles:
| Adapter | Role |
|---|---|
l0 |
Default chat, general-purpose behavior, and routing entry point |
l1 |
Personal-agent tasks such as calendar, planning, search, and life automation |
l2 |
Coding, terminal, repository, and shell tasks |
l3 |
A2UI and Generative UI |
l4 |
Computer-agent / OpenClaw-style workflows |
What Macaron Is For
A useful personal agent has to work where the user actually lives. Daily life is full of small contingent decisions: what to eat tonight, where to find a quiet table, how to reroute when traffic changes, how to schedule an errand around family obligations, or how to choose the right UI surface for a task. These tasks become hard because the user, tools, and environment all change while the agent is working.
Macaron-V1-Preview-749B targets three linked abilities:
- Capability: using real tools such as search, maps, restaurants, calendars, coding environments, and task APIs.
- Coherence: tracking a real human across turns, preferences, constraints, and changing intent.
- Expression: choosing the right surface, such as text, card, form, table, slider, or dashboard, and rendering it quickly enough to remain useful.
Architecture
Mixture-of-LoRA
Macaron-V1-Preview-749B keeps divergent skill families in separate LoRAs over a shared base model. This is intended to reduce interference between chat, personal-agent tool use, coding, computer-agent behavior, and Generative UI, while still allowing the system to add new specialist domains without modifying the base model or existing specialists.
Router Tool
Macaron exposes model selection through a router-tool design rather than a separate opaque router model. Conversations start from l0; when the current request matches a specialist domain, the harness can route to l1-l4 and return to l0 for the next task.
Reference harness: MindLab-Research/Mixture-of-LoRA-Harness.
Harness Co-Design
Macaron-V1-Preview-749B is a model-and-harness release. The model was trained and evaluated with a production-style agent harness that manages LoRA routing, tool calls, memory/state exposure, system prompts, and task metadata. Deployments that remove or replace the routed harness should expect behavior and benchmark results to change.
Generative UI and A2UI
Generative UI is a core Macaron capability. For many personal-agent tasks, the best answer is not only text: it may be a comparison card, editable task summary, booking form, route choice, slider, or dashboard.
Macaron-V1-Preview-749B is trained and evaluated with A2UI-style protocol actions. A2UI-Bench scores Generative UI along three layers:
- Protocol correctness: emitted actions are well formed and faithful to protocol semantics.
- Task construction correctness: the generated UI answers the user's request.
- User-experience lift: the UI makes the task easier than a text-only answer.
The evaluation also includes rendered visual checks for failures that text-only judges can miss, such as overflow, broken layouts, hidden controls, and spacing issues.
Evaluation
The headline benchmark suite focuses on personal-agent behavior, daily-life task surfaces, Generative UI, and OpenClaw-style workflows.
| Category | Benchmark | Macaron V1 Preview | GLM 5.1 | GPT 5.4 | Claude Opus 4.6 | Gemini 3.1 Pro | Qwen 3.6 Plus | Minimax 2.7 |
|---|---|---|---|---|---|---|---|---|
| Personal Agent Benchmark | Macaron Livingbench | 75.2 | 63.2 | 66.5 | 68.9 | 57.6 | 59.0 | 58.2 |
| VitaBench | 59.6 | 56.8 | 48.7 | 53.0 | 55.2 | 47.5 | 52.2 | |
| VitaBench (Delivery) | 67.0 | 64.2 | 50.0 | 65.0 | 63.0 | 58.0 | 63.0 | |
| VitaBench (In-Store) | 75.0 | 70.0 | 55.0 | 66.0 | 68.0 | 58.0 | 58.0 | |
| VitaBench (OTA) | 51.0 | 54.0 | 41.0 | 45.0 | 48.0 | 36.0 | 51.0 | |
| VitaBench (Cross-shop) | 45.3 | 39.0 | -- | 36.0 | 42.0 | 38.0 | 37.0 | |
| A2UI-Bench | 75.6 | 61.7 | 74.1 | 67.6 | 71.0 | 69.8 | 54.4 | |
| A2UI L1 | 89.5 | 72.2 | 82.3 | 81.5 | 85.1 | 84.1 | 75.1 | |
| A2UI L2 | 67.2 | 54.7 | 71.8 | 59.4 | 64.1 | 59.9 | 46.3 | |
| A2UI L3 | 65.7 | 54.5 | 65.4 | 57.5 | 59.2 | 60.7 | 34.8 | |
| PinchBench | 92.5 | 76.6 | 88.4 | 88.9 | 82.9 | 85.9 | 84.5 | |
| General Agent Benchmark | Tau3 Bench | 67.6 | 70.6 | 72.9 | 72.4 | 67.1 | 70.7 | 67.6 |
| SWE-bench Verified | 78.1 | 76.4 | 78.2 | 78.2 | 78.8 | 73.4 | 73.8 | |
| Terminal-Bench 2.0 | 67.4 | 63.5 | 75.1 | 65.4 | 68.5 | 61.6 | 57.0 |
Higher is better for all scores shown in the charts and table.
Evaluation Protocols
Macaron LivingBench. Models are evaluated on 30 multi-turn personal-agent cases with a 10-turn budget. The tested agent may make up to three tool-use decisions per user turn. API calls use a 240-second timeout and up to three request-level retries. The reported mean case score is 0.7 x need score + 0.3 x process score.
A2UI-Bench. Macaron-V1-Preview-749B is evaluated without explicit schema hints. Scores include protocol correctness, task construction correctness, and rendered UI quality.
VitaBench. VitaBench is used to stress realistic daily-life workflows. Since the original official judge model is no longer available, GLM-5.1 is used as both the judge and user model. Each query is run three times and the reported value is the average score.
PinchBench. PinchBench is used for search-grounded, high-precision personal-agent tasks. The reported setup uses Claude Haiku 4.5 as the judge model and Perplexity as the search API, and reports the best observed score.
Tau3 Bench. The reported setup uses GPT-5.2 with reasoning_effort=low as the user simulator and reports pass@1.
SWE-Bench Verified. The reported setup allows up to three retries only when an evaluation error occurs and reports the best successful attempt. The overall evaluation-error rate is approximately 0.8%.
Terminal-Bench 2.0. The reported setup uses the Harbor framework to run Macaron with the Pi Coding Agent Harness in sandboxed environments, with a maximum timeout of 4 hours, and reports pass@1.
AIME 2026. The reported score is included as a general-capability reference; the preview release is optimized primarily for personal-agent behavior and Generative UI rather than for maximizing this benchmark.
Installation and Loading
The repository contains both the base checkpoint and LoRA adapters, but full Macaron behavior depends on the router-aware serving harness. The transformers / peft path below is for single-adapter inspection and specialist experiments only; it does not run the full routed personal-agent system.
Single-Adapter Inspection
Install minimal inspection dependencies:
pip install -U transformers accelerate peft safetensors
Example: load the base checkpoint and attach one specialist LoRA:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
repo_id = "mindlab-research/Macaron-V1-Preview-749B"
adapter = "l1"
tokenizer = AutoTokenizer.from_pretrained(
repo_id,
trust_remote_code=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(
base_model,
repo_id,
subfolder=adapter,
)
model.eval()
Routed Serving
For full routed serving, use the reference harness: MindLab-Research/Mixture-of-LoRA-Harness. The harness is source-only; configure its LoRA Library so source_path points to local copies of this model repository's l0/-l4/ adapter directories, and use the harness README for shadow-LoRA preparation, SGLang launch, and route_decode_v2 routing.
Tool Use
Macaron-V1-Preview-749B is designed to operate with external tools. Personal-agent deployments may include:
- search
- calendar
- route planning
- restaurant/place lookup
- booking
- messaging
- task-specific APIs
- A2UI rendering actions
- coding, shell, and repository tools
The model should request explicit user confirmation before external write actions such as booking, sending messages, changing calendars, or making purchases.
Safety, Privacy, and Limitations
Macaron-V1-Preview-749B is a research preview for personal-agent settings. Deployments should protect private user state and require explicit confirmation before external write actions such as bookings, messages, purchases, or calendar edits.
Serving disclaimer: This preview is intended to test the chat, tool-use, and generative UI capabilities of Macaron-V1-Preview. Due to compute capacity constraints, it is not intended for long-horizon tasks, and access to certain models may be restricted. We may dynamically adjust serving capacity to manage system load; please understand that response times are subject to compute capacity and real-time traffic.
Full behavior depends on the routed serving harness, tool quality, and A2UI renderer compatibility. Benchmark scores may not transfer directly to deployments with different tools, routing policies, user simulators, or safety constraints.
License
Macaron-V1-Preview-749B is released under the MIT License. Users should also respect any requirements inherited from the GLM-5.1 base model and from dependencies used by the serving harness.
Citation
@misc{mindlab2026macaronv1preview,
author = {{Mind Lab}},
title = {Macaron-V1-Preview: 749B MoL Agent Model post-trained from GLM5.1},
year = {2026},
howpublished = {Mind Lab: A Lab for Experiential Intelligence},
note = {https://macaron.im/mindlab/research/macaron-v1-preview}
}
Contact
- Organization: MindLab Research
- Project: Macaron
- Release blog: macaron.im/mindlab/research/macaron-v1-preview
- Downloads last month
- 1,052
Model tree for mindlab-research/Macaron-V1-Preview-749B
Base model
zai-org/GLM-5.1Collection including mindlab-research/Macaron-V1-Preview-749B
Evaluation results
- Swe Bench Resolved on SWE-bench/SWE-bench_Verified View evaluation results leaderboard 78.1
- Terminalbench 2 on harborframework/terminal-bench-2.0 View evaluation results leaderboard 67.4

