File size: 3,747 Bytes

ca09d35
 
 
a5d22a6
2c8e2b0
 
 
b8f8fd1
 
a5d22a6
b8f8fd1
2c8e2b0
 
a5d22a6
2c8e2b0
 
b8f8fd1
2c8e2b0
b8f8fd1
f6655e7
2c8e2b0
a478344
2c8e2b0
 
 
 
 
 
b8f8fd1
2c8e2b0
b8f8fd1
2c8e2b0
 
 
b8f8fd1
2c8e2b0
 
 
 
 
 
b8f8fd1
2c8e2b0
 
 
 
 
a5d22a6
2c8e2b0
b8f8fd1
2c8e2b0
 
 
b8f8fd1
2c8e2b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8f8fd1
2c8e2b0

---
license: apache-2.0
datasets:
  - mlfoundations/dclm-baseline-1.0-parquet
language:
  - en
pipeline_tag: text-generation
---

# Covenant-72B

## Model Overview

**Covenant-72B** is the largest permissionless collaboratively trained language
model, trained entirely from scratch at the 72 billion parameter scale on 1.1
trillion tokens of English text.

![Covenant-72B](assets/covenant-72b.webp)

For more details, see the [technical report](https://arxiv.org/abs/2603.08163).
This is a base model. See [Covenant-72B-Chat](https://huggingface.co/1Covenant/Covenant-72B-Chat) for the instruction-tuned variant.

**Covenant-72B** was trained with 20+ globally distributed participants
coordinated via decentralized infrastructure on the Bittensor blockchain.
Unlike prior collaborative training efforts that use whitelisted compute,
Covenant-72B is the first to achieve this scale with fully permissionless
participation. Training used the SparseLoCo communication-efficient optimizer
to reduce bandwidth requirements across distributed nodes.

## Usage

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "1Covenant/Covenant-72B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B")

input_text = "The theory of general relativity"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
```

## Model Details

- **Compute Participants**: 20+ independent contributors on Bittensor
- **Minimum Compute per Participant**: 8×B200 or equivalent
- **Model License**: Apache 2.0

## Technical Specifications

| Parameter                 | Value                          |
| ------------------------- | ------------------------------ |
| Parameter Size            | 72B                            |
| Architecture              | LLaMA-style (LlamaForCausalLM) |
| Number of Layers          | 80                             |
| Number of Attention Heads | 64 (8 KV heads)                |
| Hidden Size               | 8192                           |
| Intermediate Size         | 28672                          |
| Head Dimension            | 128                            |
| Vocabulary Size           | 262,144                        |

**Training Details**:

- **Dataset**: [DCLM-baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0-parquet)
- **Tokens**: 1.1 Trillion
- **Optimizer**: SparseLoCo (communication-efficient optimizer)

## Performance on Benchmarks

_All results are 0-shot acc_norm (%) unless noted._

| Model              | Size | Tokens | ARC-C | ARC-E |  PIQA |  OBQA | HellaSwag | WinoGrande\* | MMLU\* |
| :----------------- | ---: | -----: | ----: | ----: | ----: | ----: | --------: | -----------: | -----: |
| **Covenant-72B**   |  72B |   1.1T | 56.83 | 80.93 | 81.56 | 44.00 |     80.61 |        75.85 |  67.11 |
| INTELLECT-1        |  10B |     1T | 44.80 | 71.76 | 77.37 | 43.80 |     70.26 |        63.30 |  32.69 |
| Psyche Consilience |  40B |   1.2T | 31.14 | 55.77 | 76.12 | 35.20 |     63.67 |        56.99 |  24.23 |
| LLM360 K2 ckpt_108 |  65B |   420B | 45.73 | 70.54 | 80.90 | 43.20 |     78.23 |        71.90 |  50.01 |
| LLM360 K2          |  65B |   1.4T | 53.75 | 75.97 | 82.54 | 48.00 |     82.86 |        76.40 |  65.51 |
| LLaMA-2-7B         |   7B |     2T | 45.05 | 73.82 | 78.73 | 44.20 |     76.18 |        69.38 |  41.73 |
| LLaMA-2-70B        |  70B |     2T | 57.42 | 79.55 | 82.59 | 49.40 |     84.34 |        80.43 |  65.63 |

_\*WinoGrande uses acc; MMLU uses acc._