mantis / README.md
mercurialsolo's picture
Upload README.md with huggingface_hub
76c58df verified
|
Raw
History Blame Contribute Delete
4.16 kB
---
license: apache-2.0
base_model:
- Hcompany/Holo3-35B-A3B
library_name: peft
pipeline_tag: image-text-to-text
tags:
- computer-use
- gui-agent
- multimodal
- action-model
- lora
- gguf
- sft
- trl
- mantis
model_name: Mantis
---
# Mantis
Mantis is an open-weights computer-use model checkpoint fine-tuned from
[Hcompany/Holo3-35B-A3B](https://huggingface.co/Hcompany/Holo3-35B-A3B). It is
trained on graded Mantis agent rollouts from Augur, using supervised fine-tuning
over real browser/workflow traces rather than generic chat data.
This release includes both the PEFT LoRA adapter and a ready-to-serve
`merged.Q8_0.gguf` artifact for llama.cpp-style serving.
## Why This Is Better For Mantis
Mantis is not a general chat fine-tune. It is specialized for the slow
improvement loop of a computer-use agent:
- **Agent-native data**: trained from Mantis rollouts with task context,
model I/O, rewards, and action traces.
- **Computer-use alignment**: targets GUI/navigation behavior on realistic
browser tasks instead of instruction-following only.
- **Deployment-ready release**: ships the small adapter plus a merged Q8_0 GGUF,
so downstream serving stacks can either compose with the base model or run the
merged artifact directly.
- **Auditable provenance**: checkpoint id `sft-c3e0d799f432-f00fa0` ties this
release to the training data/config hash used by the Mantis trainer registry.
The current internal frozen holdout gate did not establish a reliable promotion
over the base model, so this page does **not** claim a benchmark win over Holo3.
The value of this release is open access to the specialized Mantis adaptation,
its reproducible training pipeline, and its serving artifact.
## Files
- `adapter_model.safetensors`: PEFT LoRA adapter weights.
- `adapter_config.json`: PEFT adapter configuration.
- `adapter.gguf`: converted adapter artifact.
- `merged.Q8_0.gguf`: merged full-model Q8_0 GGUF for direct serving.
- tokenizer/processor files copied from the training artifact.
- `training_args.bin`: trainer metadata from the SFT run.
## Intended Use
Use this checkpoint for research and development of GUI agents, browser
automation agents, and Mantis-compatible computer-use systems.
This model may be useful when you need:
- a Holo3-derived checkpoint adapted to Mantis rollouts;
- an open adapter for further fine-tuning;
- a ready GGUF artifact for serving experiments;
- a transparent artifact from a champion/challenger training loop.
## Limitations
- This model can make incorrect UI decisions and should not be allowed to take
high-impact actions without supervision.
- The released checkpoint is specialized for Mantis-style workflows; behavior
outside that domain may not improve over the base model.
- The internal gate found no reliable promotion over the base model on the
frozen holdout available at release time.
- Computer-use agents can interact with external systems. Use sandboxing,
allowlists, rate limits, and human approval for sensitive workflows.
## Base Model And License
This model is fine-tuned from `Hcompany/Holo3-35B-A3B`, whose model card declares
the Apache-2.0 license. This release is also published under Apache-2.0 and
retains upstream attribution.
## Training
- Base model: `Hcompany/Holo3-35B-A3B`
- Method: supervised fine-tuning with TRL
- Checkpoint id: `sft-c3e0d799f432-f00fa0`
- Data source: graded Mantis rollouts pulled from Augur
- Training stack: PEFT LoRA + TRL SFT
## Citation
If you use the base model, cite Holo3:
```bibtex
@misc{hai2025holo3modelfamily,
title={Holo3 - Open Foundation Models for Navigation and Computer Use Agents},
author={H Company},
year={2026},
url={https://huggingface.co/Hcompany/Holo3-35B-A3B}
}
```
If you use the trainer stack, cite TRL:
```bibtex
@software{vonwerra2020trl,
title={{TRL: Transformers Reinforcement Learning}},
author={von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license={Apache-2.0},
url={https://github.com/huggingface/trl},
year={2020}
}
```