--- license: mit pipeline_tag: text-generation library_name: transformers tags: - reap - trinity --- > [!TIP] > **[Support this work →](https://donate.sybilsolutions.ai)** · [X](https://x.com/0xsero) · [GitHub](https://github.com/0xsero) · [REAP paper](https://arxiv.org/abs/2510.13999) · [Cerebras REAP](https://huggingface.co/collections/cerebras/cerebras-reap) # Trinity-337B REAP-pruned the base model. ## At a glance | | | |---|---| | Base model | — | | Format | BF16 | | Total params | **337B** | | Active / token | — | | Experts / layer | 216 | | Layers | 60 | | Hidden size | 3072 | | Context | 262,144 | | On-disk size | 675 GB | ## Which variant should I pick? | Variant | Format | Link | |---|---|---| | `Trinity-337B` **(this)** | BF16 | [link](https://huggingface.co/0xSero/Trinity-337B) | | `Trinity-337B-W4A16` | W4A16 | [link](https://huggingface.co/0xSero/Trinity-337B-W4A16) | | `Trinity-337B-W4A16-192` | W4A16 | [link](https://huggingface.co/0xSero/Trinity-337B-W4A16-192) | ## License & citation License inherited from the base model. ```bibtex @misc{lasby2025reap, title = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression}, author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa}, year = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv} } ``` ## Sponsors Made possible by **NVIDIA · TNG Technology · Lambda · Prime Intellect · Hot Aisle**.