Covenant-72B-Chat / README.md
joellidin's picture
Remove redundant base model link from README
2427638 verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---
# Covenant-72B-Chat
## Model Overview
**Covenant-72B-Chat** is the instruction-tuned variant of
[Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B), the largest
permissionless collaboratively trained language model. It was fine-tuned via
supervised fine-tuning (SFT) on top of the 72B-parameter base model.
For more details, see the [technical report](https://arxiv.org/abs/2603.08163).
## Usage
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"1Covenant/Covenant-72B-Chat",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat")
messages = [
{"role": "user", "content": "Explain general relativity in simple terms."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True))
```
## Model Details
- **Base Model**: [Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B)
- **Fine-tuning**: Supervised fine-tuning (SFT)
- **Model License**: Apache 2.0
## Technical Specifications
| Parameter | Value |
| ------------------------- | ------------------------------ |
| Parameter Size | 72B |
| Architecture | LLaMA-style (LlamaForCausalLM) |
| Number of Layers | 80 |
| Number of Attention Heads | 64 (8 KV heads) |
| Hidden Size | 8192 |
| Intermediate Size | 28672 |
| Head Dimension | 128 |
| Vocabulary Size | 262,144 |
## Performance on Benchmarks
_All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot._
| Model | Size | ARC-C | ARC-E | GSM8K\* | HellaSwag | MMLU\*\* | OBQA | PIQA | WinoGrande\*\* |
| :-------------------- | ---: | ----: | ----: | ------: | --------: | -------: | ----: | ----: | -------------: |
| **Covenant-72B-Chat** | 72B | 64.16 | 85.52 | 63.91 | 79.15 | 67.35 | 51.80 | 82.81 | 77.27 |
| LLaMA-2-7B-Chat | 7B | 53.16 | 80.64 | 22.59 | 78.60 | 47.23 | 42.60 | 78.24 | 72.45 |
| LLaMA-2-70B-Chat | 70B | 65.36 | 85.31 | 52.16 | 85.90 | 63.08 | 47.40 | 81.56 | 79.56 |
| K2-Chat (65B) | 65B | 61.95 | 85.82 | 79.00 | 79.31 | 67.87 | 48.20 | 83.35 | 79.64 |
_\*strict; \*\*acc. All others use acc_norm._
### Additional Benchmarks
| Model | Size | BBH CoT\* | IFEval\*\* | MATH\* | MMLU-Pro\* | MuSR |
| :-------------------- | ---: | --------: | ---------: | -----: | ---------: | ----: |
| **Covenant-72B-Chat** | 72B | 54.97 | 64.70 | 26.28 | 40.91 | 39.68 |
| LLaMA-2-7B-Chat | 7B | 40.42 | 30.87 | 4.82 | 22.88 | 40.21 |
| LLaMA-2-70B-Chat | 70B | 63.22 | 40.67 | 10.66 | 35.20 | 48.68 |
| K2-Chat (65B) | 65B | 69.79 | 45.47 | 19.06 | 45.36 | 46.56 |
_\*exact_match; \*\*prompt_strict. MuSR uses acc_norm._