Arcee Trinity Large Thinking

Trinity-Large-Thinking-GGUF

Introduction

Trinity-Large-Thinking is a reasoning-optimized variant of Arcee AI's Trinity-Large family — a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token, post-trained with extended chain-of-thought reasoning and agentic RL.

This repository contains GGUF quantized weights of Trinity-Large-Thinking in multiple quantization levels.

For full model details, benchmarks, and usage guidance, see the main Trinity-Large-Thinking model card.

Available Quantizations

Quant Type Use Case
Q8_0 8-bit Best quality, highest memory
Q6_K_L 6-bit (large) Near-lossless
Q6_K 6-bit Near-lossless
Q5_K_L 5-bit (large) High quality
Q5_K_M 5-bit (medium) High quality
Q5_K_S 5-bit (small) High quality
Q4_K_L 4-bit (large) Recommended balance of quality and size
Q4_K_M 4-bit (medium) Good balance
Q4_K_S 4-bit (small) Good balance
Q4_1 4-bit Good balance
Q4_0 4-bit Good balance
Q3_K_XL 3-bit (extra large) Lower memory
Q3_K_L 3-bit (large) Lower memory
Q3_K_M 3-bit (medium) Lower memory
Q3_K_S 3-bit (small) Lower memory
IQ4_NL 4-bit (imatrix) Importance-weighted 4-bit
IQ4_XS 4-bit (imatrix) Importance-weighted 4-bit, smaller
IQ3_M 3-bit (imatrix) Importance-weighted 3-bit
IQ3_XS 3-bit (imatrix) Importance-weighted 3-bit, smaller
IQ3_XXS 3-bit (imatrix) Importance-weighted 3-bit, smallest
IQ2_M 2-bit (imatrix) Extreme compression
IQ2_S 2-bit (imatrix) Extreme compression
IQ2_XS 2-bit (imatrix) Extreme compression
IQ2_XXS 2-bit (imatrix) Extreme compression
Q2_K_L 2-bit (large) Extreme compression
Q2_K 2-bit Extreme compression
IQ1_M 1-bit (imatrix) Research / experimental
IQ1_S 1-bit (imatrix) Research / experimental

Usage

llama.cpp

Supported in llama.cpp release b7061+.

# Recommended quant
llama-server -hf arcee-ai/Trinity-Large-Thinking-GGUF:Q4_K_M

# Higher quality
llama-server -hf arcee-ai/Trinity-Large-Thinking-GGUF:Q6_K

# Lower memory
llama-server -hf arcee-ai/Trinity-Large-Thinking-GGUF:Q3_K_M

LM Studio

Search for arcee-ai/Trinity-Large-Thinking-GGUF in Model Search. Select your preferred quantization level.

API

Works out of the box on OpenRouter as arcee-ai/trinity-large-thinking.

License

Trinity-Large-Thinking-GGUF is released under the Apache License, Version 2.0.

Citation

If you use this model, please cite:

@misc{singh2026arceetrinity,
  title        = {Arcee Trinity Large Technical Report},
  author       = {Varun Singh and Lucas Krauss and Sami Jaghouar and Matej Sirovatka and Charles Goddard and Fares Obied and Jack Min Ong and Jannik Straube and Fern and Aria Harley and Conner Stewart and Colin Kealty and Maziyar Panahi and Simon Kirsten and Anushka Deshpande and Anneketh Vij and Arthur Bresnu and Pranav Veldurthi and Raghav Ravishankar and Hardik Bishnoi and DatologyAI Team and Arcee AI Team and Prime Intellect Team and Mark McQuade and Johannes Hagemann and Lucas Atkins},
  year         = {2026},
  eprint       = {2602.17004},
  archivePrefix= {arXiv},
  primaryClass = {cs.LG},
  doi          = {10.48550/arXiv.2602.17004},
  url          = {https://arxiv.org/abs/2602.17004}
}
Downloads last month
-
GGUF
Model size
399B params
Architecture
afmoe
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arcee-ai/Trinity-Large-Thinking-GGUF

Collection including arcee-ai/Trinity-Large-Thinking-GGUF

Paper for arcee-ai/Trinity-Large-Thinking-GGUF