bartowski's picture
Update README.md
be9a3f8 verified
---
license: apache-2.0
language:
- en
- es
- fr
- de
- it
- pt
- ru
- ar
- hi
- ko
- zh
library_name: transformers
base_model:
- arcee-ai/Trinity-Large-Base
---
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<picture>
<img
src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png"
alt="Arcee Trinity Large"
style="max-width: 100%; height: auto;"
>
</picture>
</div>
<hr>
# Trinity-Large-Preview
## Introduction
Trinity-Large-Preview is a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. It is the largest model in Arcee AI's Trinity family, trained on more than 17 trillion tokens and delivering frontier-level performance with strong long-context comprehension.
Trinity-Large-Preview is a lightly post-trained model based on Trinity-Large-Base.
Try it at [chat.arcee.ai](http://chat.arcee.ai/)
More details on the training of Trinity Large are available in the [technical report](https://github.com/arcee-ai/trinity-large-tech-report/).
## Model Variants
The Trinity Large family consists of three checkpoints from the same training run:
- **Trinity-Large-Preview** (this release): Lightly post-trained, chat-ready model undergoing active RL
- **[Trinity-Large-TrueBase](https://huggingface.co/arcee-ai/Trinity-Large-TrueBase)**: 10T-token pre-anneal pretraining checkpoint
- **[Trinity-Large-Base](https://huggingface.co/arcee-ai/Trinity-Large-Base)**: Full 17T-token pretrained foundation model with mid-training anneals
## Architecture
Trinity-Large-Preview uses a sparse MoE configuration designed to maximize efficiency while maintaining large-scale capacity.
| Hyperparameter | Value |
|:---|:---:|
| Total parameters | ~398B |
| Active parameters per token | ~13B |
| Experts | 256 (1 shared) |
| Active experts | 4 |
| Routing strategy | 4-of-256 (1.56% sparsity) |
| Dense layers | 6 |
| Pretraining context length | 8,192 |
| Context length after extension | 512k |
| Architecture | Sparse MoE (AfmoeForCausalLM) |
## Benchmarks
| Benchmark | Llama 4 Maverick | Trinity-Large Preview |
|-----------|------------------|----------------------|
| MMLU | 85.5 | 87.2 |
| MMLU-Pro | 80.5 | 75.2 |
| GPQA-Diamond | 69.8 | 63.3 |
| AIME 2025 | 19.3 | 24.0 |
## Training Configuration
### Pretraining
- Training tokens: 17 trillion
- Data partner: [Datology](https://www.datologyai.com/)
<div align="center">
<picture>
<img src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/sSVjGNHfrJKmQ6w8I18ek.png" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Datology">
</picture>
</div>
## Posttraining
- This checkpoint was instruction tuned on 20B tokens.
### Infrastructure
- Hardware: 2,048 NVIDIA B300 GPUs
- Parallelism: HSDP + Expert Parallelism
- Compute partner: [Prime Intellect](https://www.primeintellect.ai/)
<div align="center">
<picture>
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/61e020e4a343274bb132e138/H2mcdPRWtl4iKLd-OYYBc.jpeg" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Prime Intellect">
</picture>
</div>
## Usage
### Running our model
- [Transformers](https://huggingface.co/arcee-ai/Trinity-Large-Preview#transformers)
- [VLLM](https://huggingface.co/arcee-ai/Trinity-Large-Preview#vllm)
- [llama.cpp](https://huggingface.co/arcee-ai/Trinity-Large-Preview#llamacpp)
- [LM Studio](https://huggingface.co/arcee-ai/Trinity-Large-Preview#lm-studio)
- [API](https://huggingface.co/arcee-ai/Trinity-Large-Preview#api)
### Transformers
Use the `main` transformers branch or pass `trust_remote_code=True` with a released version.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "arcee-ai/Trinity-Large-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [
{"role": "user", "content": "Who are you?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.8,
top_k=50,
top_p=0.8
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### VLLM
Supported in VLLM release 0.11.1+
```bash
vllm serve arcee-ai/Trinity-Large-Preview \
--dtype bfloat16 \
--enable-auto-tool-choice \
--tool-call-parser hermes
```
### llama.cpp
Supported in llama.cpp release b7061+
```bash
llama-server -hf arcee-ai/Trinity-Large-Preview-GGUF:q4_k_m
```
### LM Studio
Supported in the latest LM Studio runtime. Search for `arcee-ai/Trinity-Large-Preview-GGUF` in Model Search.
### API
Available on OpenRouter:
```bash
curl -X POST "https://openrouter.ai/v1/chat/completions" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "arcee-ai/trinity-large-preview",
"messages": [
{
"role": "user",
"content": "What are some fun things to do in New York?"
}
]
}'
```
## License
Trinity-Large-Preview is released under the Apache License, Version 2.0.
## Citation
```bibtex
@misc{arcee_trinity_large_preview,
title = {Trinity-Large-Preview},
author = {{Arcee AI}},
year = {2026},
note = {398B sparse MoE model trained on 17T tokens}
}
```