Buckets:

121 GB
9 files
Updated 18 days ago
Name
Size
MacArena.utm
osworld.utm
README.md3.11 kB
xet
README.md

MacArena — Virtual Machine Images

macOS arXiv GitHub

Pre-built macOS virtual machine images for the MacArena benchmark — a framework for evaluating computer-use agents on live macOS environments.


Contents

This bucket contains two VM images:

Folder VM Name Use with
MacArena.utm macarena MacArena custom tasks (evaluation_examples/macarena/)
osworld.utm osworld OSWorld + macOSWorld tasks (evaluation_examples/osworld/, evaluation_examples/macosworld/)

Requirements


Setup

1. Install UTM

brew install --cask utm

Or download directly from https://github.com/utmapp/UTM.

2. Download a VM image

Download the desired .utm folder from this bucket.

3. Import into UTM

Double-click the .utm file — UTM will import it automatically — or drag it into the UTM window.

4. Run the benchmark

Clone the MacArena repository and point the runner at the imported VM:

# Manual mode
python3 macos_test.py --vm_name macarena --dir ./evaluation_examples/macarena/

# Automated mode
python3 -m runners.run_general \
    --path_to_vm "macarena" \
    --test_all_meta_path "evaluation_examples/test_split.json" \
    --model "ByteDance-Seed/UI-TARS-1.5-7B" \
    --base_url "<url_to_model>" \
    --max_steps 15

Use --vm_name osworld when running OSWorld or macOSWorld tasks against osworld.utm.


Which VM do I need?

  • Running MacArena tasks → use MacArena.utm with --vm_name macarena
  • Running OSWorld or macOSWorld tasks → use osworld.utm with --vm_name osworld

Legal

Use of these VM images is subject to Apple's macOS Software License Agreement. You are responsible for ensuring you have a valid license to run macOS on your hardware. Tasks sourced from macOSWorld are licensed under CC BY-NC 4.0 and may not be used for commercial purposes.

See the MacArena repository for the full legal disclaimer.


Citation

If you use these VM images in your research, please cite:

@misc{muryn-etal-2026-aiwild-macarena,
  author      = {Victor Muryn and Maksym Shamrai and Sofiia Mazepa and Yehor Khodysko},
  title       = {MacArena: Benchmarking Computer Use Agents on an Online macOS Environment},
  month       = {June},
  year        = {2026},
  eprint      = {2606.06560},
  eprinttype  = {arxiv},
  eprintclass = {cs.LG},
  url         = {https://arxiv.org/abs/2606.06560},
  urldate     = {2026-06-08},
  note        = {\emph{Accepted to AIWILD @ ICML 2026}},
}
Total size
121 GB
Files
9
Last updated
Jun 8
Pre-warmed CDN
US EU US EU

Contributors