File size: 6,337 Bytes
18a66cf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
license: apache-2.0
language:
- en
- es
- fr
- de
- it
- pt
- ru
- ar
- hi
- ko
- zh
library_name: transformers
---
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<picture>
<img
src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png"
alt="Arcee Trinity Large"
style="max-width: 100%; height: auto;"
>
</picture>
</div>
<hr>
# Trinity-Large-TrueBase
## Introduction
Trinity-Large-TrueBase is a base pretraining checkpoint from Arcee AI's Trinity Large training run. It is a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. The checkpoint was captured after 10 trillion tokens of pretraining, prior to learning-rate annealing and before any instruction tuning or reinforcement learning.
This checkpoint is intended for research, probing, ablation studies, and downstream fine-tuning and comes without any pre-baked alignment, instruction formatting, or preference optimization.
More details on the training of Trinity Large are available in the [technical report](https://github.com/arcee-ai/trinity-large-tech-report/).
## Model Variants
The Trinity Large family consists of three checkpoints from the same training run:
- **Trinity-Large-TrueBase** (this release): 10T-token pre-anneal checkpoint with no instruction data
- **[Trinity-Large-Base](https://huggingface.co/arcee-ai/Trinity-Large-Base)**: Full 17T-token pretrained foundation model with mid-training anneals
- **[Trinity-Large-Preview](https://huggingface.co/arcee-ai/Trinity-Large-Preview)**: Lightly post-trained, chat-ready model undergoing active RL
## Architecture
Trinity-Large-TrueBase uses a sparse MoE configuration designed to maximize efficiency while maintaining large-scale capacity.
| Hyperparameter | Value |
|:---|:---:|
| Total parameters | ~398B |
| Active parameters per token | ~13B |
| Experts | 256 |
| Active experts | 4 |
| Routing strategy | 4-of-256 (1.56% sparsity) |
| Dense layers | 6 |
| Pretraining context length | 8,192 |
| Architecture | Sparse MoE (AfmoeForCausalLM) |
Note: Extended context support (e.g., 512k) was introduced after this checkpoint and is not available in TrueBase.
## Benchmark Results
| Benchmark | N-shot | Metric | Score | Stderr |
|-------------------------------|--------|-------------------------------|--------|---------|
| arc_challenge_0shot | 0 | acc_norm,none | 0.6237 | ±0.0142 |
| bbh_fewshot | 3 | exact_match,remove_whitespace | 0.5784 | ±0.0054 |
| gpqa_diamond_5shot | 5 | acc_norm,none | 0.4091 | ±0.0350 |
| gpqa_diamond_generative_5shot | 5 | exact_match,flexible-extract | 0.3788 | ±0.0346 |
| gsm8k_8shot | 8 | exact_match,flexible-extract | 0.8036 | ±0.0109 |
| gsm8k_cot | 8 | exact_match,flexible-extract | 0.8044 | ±0.0109 |
| hellaswag_5shot | 5 | acc_norm,none | 0.8813 | ±0.0032 |
| humaneval_plus | 0 | pass@1,create_test | 0.5183 | ±0.0391 |
| leaderboard_math_hard | 4 | exact_match,none | 0.2696 | ±0.0113 |
| mbpp_plus | 3 | pass_at_1,none | 0.8095 | ±0.0202 |
| minerva_math500 | 4 | math_verify,none | 0.4820 | ±0.0224 |
| mmlu_5shot | 5 | acc,none | 0.7845 | ±0.0033 |
| mmlu_generative_5shot | 5 | exact_match,get_response | 0.7848 | ±0.0033 |
| mmlu_pro | 5 | exact_match,custom-extract | 0.5160 | ±0.0044 |
| triviaqa_5shot | 5 | exact_match,remove_whitespace | 0.8096 | ±0.0029 |
| winogrande_5shot | 5 | acc,none | 0.8145 | ±0.0109 |
## Training Configuration
### Pretraining
- Training tokens: 10 trillion
- Checkpoint type: Pre-anneal
- Instruction data: None
- RLHF or post-training: None
This checkpoint branches from the main Trinity Large run at the 10T-token mark, prior to learning-rate decay or post-training phases.
### Optimizers
Optimizer learning rates after WSD warm-up:
- Adam learning rate: 2e-4
- Muon learning rate: 8e-4
Muon was used to support larger critical batch sizes in a highly sparse MoE regime.
### Infrastructure
- Hardware: 2,048 NVIDIA B300 GPUs
- Parallelism: HSDP + Expert Parallelism
- Compute partner: [Prime Intellect](https://www.primeintellect.ai/)
- Data partner: [Datology](https://www.datologyai.com/)
<div align="center">
<picture>
<img src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/sSVjGNHfrJKmQ6w8I18ek.png" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Datology">
</picture>
</div>
<div align="center">
<picture>
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/61e020e4a343274bb132e138/H2mcdPRWtl4iKLd-OYYBc.jpeg" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Datology">
</picture>
</div>
## Intended Use
- Studying emergent behavior from large-scale pretraining
- Sparse MoE routing and load-balancing research
- Interpretability, probing, and ablation studies
- Domain-specific fine-tuning from a clean base
- Academic and industrial foundation model research
## Rationale for Release
Most base model releases include instruction data, annealed training dynamics, or early alignment stages. Trinity-Large-TrueBase excludes these, providing an opportunity to study what large-scale models learn from pretraining data alone. This checkpoint is intended as a foundation for research rather than as a finished conversational assistant.
## Known Limitations
- Not aligned for safety, helpfulness, or conversational tone
- Requires substantial compute and expertise to fine-tune
- May exhibit raw or unstable behaviors typical of unaligned models
- No extended-context tuning beyond the 8K pretraining window
## License
Trinity-Large-TrueBase is released under the Apache License, Version 2.0.
|