--- license: apache-2.0 language: - en - es - fr - de - it - pt - ru - ar - hi - ko - zh library_name: transformers base_model: - arcee-ai/Trinity-Large-Base ---
Arcee Trinity Large

# Trinity-Large-Preview ## Introduction Trinity-Large-Preview is a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. It is the largest model in Arcee AI's Trinity family, trained on more than 17 trillion tokens and delivering frontier-level performance with strong long-context comprehension. Trinity-Large-Preview is a lightly post-trained model based on Trinity-Large-Base. Try it at [chat.arcee.ai](http://chat.arcee.ai/) More details on the training of Trinity Large are available in the [technical report](https://github.com/arcee-ai/trinity-large-tech-report/). ## Model Variants The Trinity Large family consists of three checkpoints from the same training run: - **Trinity-Large-Preview** (this release): Lightly post-trained, chat-ready model undergoing active RL - **[Trinity-Large-TrueBase](https://huggingface.co/arcee-ai/Trinity-Large-TrueBase)**: 10T-token pre-anneal pretraining checkpoint - **[Trinity-Large-Base](https://huggingface.co/arcee-ai/Trinity-Large-Base)**: Full 17T-token pretrained foundation model with mid-training anneals ## Architecture Trinity-Large-Preview uses a sparse MoE configuration designed to maximize efficiency while maintaining large-scale capacity. | Hyperparameter | Value | |:---|:---:| | Total parameters | ~398B | | Active parameters per token | ~13B | | Experts | 256 (1 shared) | | Active experts | 4 | | Routing strategy | 4-of-256 (1.56% sparsity) | | Dense layers | 6 | | Pretraining context length | 8,192 | | Context length after extension | 512k | | Architecture | Sparse MoE (AfmoeForCausalLM) | ## Benchmarks | Benchmark | Llama 4 Maverick | Trinity-Large Preview | |-----------|------------------|----------------------| | MMLU | 85.5 | 87.2 | | MMLU-Pro | 80.5 | 75.2 | | GPQA-Diamond | 69.8 | 63.3 | | AIME 2025 | 19.3 | 24.0 | ## Training Configuration ### Pretraining - Training tokens: 17 trillion - Data partner: [Datology](https://www.datologyai.com/)
Powered by Datology
## Posttraining - This checkpoint was instruction tuned on 20B tokens. ### Infrastructure - Hardware: 2,048 NVIDIA B300 GPUs - Parallelism: HSDP + Expert Parallelism - Compute partner: [Prime Intellect](https://www.primeintellect.ai/)
Powered by Prime Intellect
## Usage ### Running our model - [Transformers](https://huggingface.co/arcee-ai/Trinity-Large-Preview#transformers) - [VLLM](https://huggingface.co/arcee-ai/Trinity-Large-Preview#vllm) - [llama.cpp](https://huggingface.co/arcee-ai/Trinity-Large-Preview#llamacpp) - [LM Studio](https://huggingface.co/arcee-ai/Trinity-Large-Preview#lm-studio) - [API](https://huggingface.co/arcee-ai/Trinity-Large-Preview#api) ### Transformers Use the `main` transformers branch or pass `trust_remote_code=True` with a released version. ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "arcee-ai/Trinity-Large-Preview" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) messages = [ {"role": "user", "content": "Who are you?"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate( input_ids, max_new_tokens=256, do_sample=True, temperature=0.8, top_k=50, top_p=0.8 ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### VLLM Supported in VLLM release 0.11.1+ ```bash vllm serve arcee-ai/Trinity-Large-Preview \ --dtype bfloat16 \ --enable-auto-tool-choice \ --tool-call-parser hermes ``` ### llama.cpp Supported in llama.cpp release b7061+ ```bash llama-server -hf arcee-ai/Trinity-Large-Preview-GGUF:q4_k_m ``` ### LM Studio Supported in the latest LM Studio runtime. Search for `arcee-ai/Trinity-Large-Preview-GGUF` in Model Search. ### API Available on OpenRouter: ```bash curl -X POST "https://openrouter.ai/v1/chat/completions" \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "arcee-ai/trinity-large-preview", "messages": [ { "role": "user", "content": "What are some fun things to do in New York?" } ] }' ``` ## License Trinity-Large-Preview is released under the Apache License, Version 2.0. ## Citation ```bibtex @misc{arcee_trinity_large_preview, title = {Trinity-Large-Preview}, author = {{Arcee AI}}, year = {2026}, note = {398B sparse MoE model trained on 17T tokens} } ```