---
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
---

# Covenant-72B-Chat

## Model Overview

**Covenant-72B-Chat** is the instruction-tuned variant of
[Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B), the largest
permissionless collaboratively trained language model. It was fine-tuned via
supervised fine-tuning (SFT) on top of the 72B-parameter base model.

For more details, see the [technical report](https://arxiv.org/abs/2603.08163).

## Usage

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "1Covenant/Covenant-72B-Chat",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat")

messages = [
    {"role": "user", "content": "Explain general relativity in simple terms."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True))
```

## Model Details

- **Base Model**: [Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B)
- **Fine-tuning**: Supervised fine-tuning (SFT)
- **Model License**: Apache 2.0

## Technical Specifications

| Parameter                 | Value                          |
| ------------------------- | ------------------------------ |
| Parameter Size            | 72B                            |
| Architecture              | LLaMA-style (LlamaForCausalLM) |
| Number of Layers          | 80                             |
| Number of Attention Heads | 64 (8 KV heads)                |
| Hidden Size               | 8192                           |
| Intermediate Size         | 28672                          |
| Head Dimension            | 128                            |
| Vocabulary Size           | 262,144                        |

## Performance on Benchmarks

_All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot._

| Model                 | Size | ARC-C | ARC-E | GSM8K\* | HellaSwag | MMLU\*\* |  OBQA |  PIQA | WinoGrande\*\* |
| :-------------------- | ---: | ----: | ----: | ------: | --------: | -------: | ----: | ----: | -------------: |
| **Covenant-72B-Chat** |  72B | 64.16 | 85.52 |   63.91 |     79.15 |    67.35 | 51.80 | 82.81 |          77.27 |
| LLaMA-2-7B-Chat       |   7B | 53.16 | 80.64 |   22.59 |     78.60 |    47.23 | 42.60 | 78.24 |          72.45 |
| LLaMA-2-70B-Chat      |  70B | 65.36 | 85.31 |   52.16 |     85.90 |    63.08 | 47.40 | 81.56 |          79.56 |
| K2-Chat (65B)         |  65B | 61.95 | 85.82 |   79.00 |     79.31 |    67.87 | 48.20 | 83.35 |          79.64 |

_\*strict; \*\*acc. All others use acc_norm._

### Additional Benchmarks

| Model                 | Size | BBH CoT\* | IFEval\*\* | MATH\* | MMLU-Pro\* |  MuSR |
| :-------------------- | ---: | --------: | ---------: | -----: | ---------: | ----: |
| **Covenant-72B-Chat** |  72B |     54.97 |      64.70 |  26.28 |      40.91 | 39.68 |
| LLaMA-2-7B-Chat       |   7B |     40.42 |      30.87 |   4.82 |      22.88 | 40.21 |
| LLaMA-2-70B-Chat      |  70B |     63.22 |      40.67 |  10.66 |      35.20 | 48.68 |
| K2-Chat (65B)         |  65B |     69.79 |      45.47 |  19.06 |      45.36 | 46.56 |

_\*exact_match; \*\*prompt_strict. MuSR uses acc_norm._