Update README.md

be9a3f8 verified 3 days ago

5.71 kB

	---
	license: apache-2.0
	language:
	- en
	- es
	- fr
	- de
	- it
	- pt
	- ru
	- ar
	- hi
	- ko
	- zh
	library_name: transformers
	base_model:
	- arcee-ai/Trinity-Large-Base
	---
	<!-- markdownlint-disable first-line-h1 -->
	<!-- markdownlint-disable html -->
	<!-- markdownlint-disable no-duplicate-header -->

	<div align="center">
	<picture>
	<img
	src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png"
	alt="Arcee Trinity Large"
	style="max-width: 100%; height: auto;"
	>
	</picture>
	</div>
	<hr>

	# Trinity-Large-Preview

	## Introduction

	Trinity-Large-Preview is a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. It is the largest model in Arcee AI's Trinity family, trained on more than 17 trillion tokens and delivering frontier-level performance with strong long-context comprehension.
	Trinity-Large-Preview is a lightly post-trained model based on Trinity-Large-Base.

	Try it at [chat.arcee.ai](http://chat.arcee.ai/)

	More details on the training of Trinity Large are available in the [technical report](https://github.com/arcee-ai/trinity-large-tech-report/).


	## Model Variants

	The Trinity Large family consists of three checkpoints from the same training run:

	- Trinity-Large-Preview (this release): Lightly post-trained, chat-ready model undergoing active RL
	- [Trinity-Large-TrueBase](https://huggingface.co/arcee-ai/Trinity-Large-TrueBase): 10T-token pre-anneal pretraining checkpoint
	- [Trinity-Large-Base](https://huggingface.co/arcee-ai/Trinity-Large-Base): Full 17T-token pretrained foundation model with mid-training anneals

	## Architecture

	Trinity-Large-Preview uses a sparse MoE configuration designed to maximize efficiency while maintaining large-scale capacity.

	\| Hyperparameter \| Value \|
	\|:---\|:---:\|
	\| Total parameters \| ~398B \|
	\| Active parameters per token \| ~13B \|
	\| Experts \| 256 (1 shared) \|
	\| Active experts \| 4 \|
	\| Routing strategy \| 4-of-256 (1.56% sparsity) \|
	\| Dense layers \| 6 \|
	\| Pretraining context length \| 8,192 \|
	\| Context length after extension \| 512k \|
	\| Architecture \| Sparse MoE (AfmoeForCausalLM) \|

	## Benchmarks

	\| Benchmark \| Llama 4 Maverick \| Trinity-Large Preview \|
	\|-----------\|------------------\|----------------------\|
	\| MMLU \| 85.5 \| 87.2 \|
	\| MMLU-Pro \| 80.5 \| 75.2 \|
	\| GPQA-Diamond \| 69.8 \| 63.3 \|
	\| AIME 2025 \| 19.3 \| 24.0 \|

	## Training Configuration

	### Pretraining

	- Training tokens: 17 trillion
	- Data partner: [Datology](https://www.datologyai.com/)

	<div align="center">
	<picture>
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/sSVjGNHfrJKmQ6w8I18ek.png" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Datology">
	</picture>
	</div>

	## Posttraining
	- This checkpoint was instruction tuned on 20B tokens.

	### Infrastructure

	- Hardware: 2,048 NVIDIA B300 GPUs
	- Parallelism: HSDP + Expert Parallelism
	- Compute partner: [Prime Intellect](https://www.primeintellect.ai/)


	<div align="center">
	<picture>
	<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/61e020e4a343274bb132e138/H2mcdPRWtl4iKLd-OYYBc.jpeg" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Prime Intellect">
	</picture>
	</div>

	## Usage

	### Running our model

	- [Transformers](https://huggingface.co/arcee-ai/Trinity-Large-Preview#transformers)
	- [VLLM](https://huggingface.co/arcee-ai/Trinity-Large-Preview#vllm)
	- [llama.cpp](https://huggingface.co/arcee-ai/Trinity-Large-Preview#llamacpp)
	- [LM Studio](https://huggingface.co/arcee-ai/Trinity-Large-Preview#lm-studio)
	- [API](https://huggingface.co/arcee-ai/Trinity-Large-Preview#api)


	### Transformers

	Use the `main` transformers branch or pass `trust_remote_code=True` with a released version.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "arcee-ai/Trinity-Large-Preview"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True
	)

	messages = [
	{"role": "user", "content": "Who are you?"},
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	do_sample=True,
	temperature=0.8,
	top_k=50,
	top_p=0.8
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### VLLM

	Supported in VLLM release 0.11.1+

	```bash
	vllm serve arcee-ai/Trinity-Large-Preview \
	--dtype bfloat16 \
	--enable-auto-tool-choice \
	--tool-call-parser hermes
	```

	### llama.cpp

	Supported in llama.cpp release b7061+

	```bash
	llama-server -hf arcee-ai/Trinity-Large-Preview-GGUF:q4_k_m
	```

	### LM Studio

	Supported in the latest LM Studio runtime. Search for `arcee-ai/Trinity-Large-Preview-GGUF` in Model Search.

	### API

	Available on OpenRouter:

	```bash
	curl -X POST "https://openrouter.ai/v1/chat/completions" \
	-H "Authorization: Bearer $OPENROUTER_API_KEY" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "arcee-ai/trinity-large-preview",
	"messages": [
	{
	"role": "user",
	"content": "What are some fun things to do in New York?"
	}
	]
	}'
	```


	## License

	Trinity-Large-Preview is released under the Apache License, Version 2.0.

	## Citation

	```bibtex
	@misc{arcee_trinity_large_preview,
	title = {Trinity-Large-Preview},
	author = {{Arcee AI}},
	year = {2026},
	note = {398B sparse MoE model trained on 17T tokens}
	}
	```