mantis / README.md

Upload README.md with huggingface_hub

76c58df verified about 13 hours ago

4.16 kB

	---
	license: apache-2.0
	base_model:
	- Hcompany/Holo3-35B-A3B
	library_name: peft
	pipeline_tag: image-text-to-text
	tags:
	- computer-use
	- gui-agent
	- multimodal
	- action-model
	- lora
	- gguf
	- sft
	- trl
	- mantis
	model_name: Mantis
	---

	# Mantis

	Mantis is an open-weights computer-use model checkpoint fine-tuned from
	[Hcompany/Holo3-35B-A3B](https://huggingface.co/Hcompany/Holo3-35B-A3B). It is
	trained on graded Mantis agent rollouts from Augur, using supervised fine-tuning
	over real browser/workflow traces rather than generic chat data.

	This release includes both the PEFT LoRA adapter and a ready-to-serve
	`merged.Q8_0.gguf` artifact for llama.cpp-style serving.

	## Why This Is Better For Mantis

	Mantis is not a general chat fine-tune. It is specialized for the slow
	improvement loop of a computer-use agent:

	- Agent-native data: trained from Mantis rollouts with task context,
	model I/O, rewards, and action traces.
	- Computer-use alignment: targets GUI/navigation behavior on realistic
	browser tasks instead of instruction-following only.
	- Deployment-ready release: ships the small adapter plus a merged Q8_0 GGUF,
	so downstream serving stacks can either compose with the base model or run the
	merged artifact directly.
	- Auditable provenance: checkpoint id `sft-c3e0d799f432-f00fa0` ties this
	release to the training data/config hash used by the Mantis trainer registry.

	The current internal frozen holdout gate did not establish a reliable promotion
	over the base model, so this page does not claim a benchmark win over Holo3.
	The value of this release is open access to the specialized Mantis adaptation,
	its reproducible training pipeline, and its serving artifact.

	## Files

	- `adapter_model.safetensors`: PEFT LoRA adapter weights.
	- `adapter_config.json`: PEFT adapter configuration.
	- `adapter.gguf`: converted adapter artifact.
	- `merged.Q8_0.gguf`: merged full-model Q8_0 GGUF for direct serving.
	- tokenizer/processor files copied from the training artifact.
	- `training_args.bin`: trainer metadata from the SFT run.

	## Intended Use

	Use this checkpoint for research and development of GUI agents, browser
	automation agents, and Mantis-compatible computer-use systems.

	This model may be useful when you need:

	- a Holo3-derived checkpoint adapted to Mantis rollouts;
	- an open adapter for further fine-tuning;
	- a ready GGUF artifact for serving experiments;
	- a transparent artifact from a champion/challenger training loop.

	## Limitations

	- This model can make incorrect UI decisions and should not be allowed to take
	high-impact actions without supervision.
	- The released checkpoint is specialized for Mantis-style workflows; behavior
	outside that domain may not improve over the base model.
	- The internal gate found no reliable promotion over the base model on the
	frozen holdout available at release time.
	- Computer-use agents can interact with external systems. Use sandboxing,
	allowlists, rate limits, and human approval for sensitive workflows.

	## Base Model And License

	This model is fine-tuned from `Hcompany/Holo3-35B-A3B`, whose model card declares
	the Apache-2.0 license. This release is also published under Apache-2.0 and
	retains upstream attribution.

	## Training

	- Base model: `Hcompany/Holo3-35B-A3B`
	- Method: supervised fine-tuning with TRL
	- Checkpoint id: `sft-c3e0d799f432-f00fa0`
	- Data source: graded Mantis rollouts pulled from Augur
	- Training stack: PEFT LoRA + TRL SFT

	## Citation

	If you use the base model, cite Holo3:

	```bibtex
	@misc{hai2025holo3modelfamily,
	title={Holo3 - Open Foundation Models for Navigation and Computer Use Agents},
	author={H Company},
	year={2026},
	url={https://huggingface.co/Hcompany/Holo3-35B-A3B}
	}
	```

	If you use the trainer stack, cite TRL:

	```bibtex
	@software{vonwerra2020trl,
	title={{TRL: Transformers Reinforcement Learning}},
	author={von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
	license={Apache-2.0},
	url={https://github.com/huggingface/trl},
	year={2020}
	}
	```