| --- |
| license: apache-2.0 |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # Covenant-72B-Chat |
|
|
| ## Model Overview |
|
|
| **Covenant-72B-Chat** is the instruction-tuned variant of |
| [Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B), the largest |
| permissionless collaboratively trained language model. It was fine-tuned via |
| supervised fine-tuning (SFT) on top of the 72B-parameter base model. |
|
|
| For more details, see the [technical report](https://arxiv.org/abs/2603.08163). |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "1Covenant/Covenant-72B-Chat", |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat") |
| |
| messages = [ |
| {"role": "user", "content": "Explain general relativity in simple terms."}, |
| ] |
| input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device) |
| output_ids = model.generate(input_ids, max_new_tokens=256) |
| print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)) |
| ``` |
|
|
| ## Model Details |
|
|
| - **Base Model**: [Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B) |
| - **Fine-tuning**: Supervised fine-tuning (SFT) |
| - **Model License**: Apache 2.0 |
|
|
| ## Technical Specifications |
|
|
| | Parameter | Value | |
| | ------------------------- | ------------------------------ | |
| | Parameter Size | 72B | |
| | Architecture | LLaMA-style (LlamaForCausalLM) | |
| | Number of Layers | 80 | |
| | Number of Attention Heads | 64 (8 KV heads) | |
| | Hidden Size | 8192 | |
| | Intermediate Size | 28672 | |
| | Head Dimension | 128 | |
| | Vocabulary Size | 262,144 | |
|
|
| ## Performance on Benchmarks |
|
|
| _All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot._ |
|
|
| | Model | Size | ARC-C | ARC-E | GSM8K\* | HellaSwag | MMLU\*\* | OBQA | PIQA | WinoGrande\*\* | |
| | :-------------------- | ---: | ----: | ----: | ------: | --------: | -------: | ----: | ----: | -------------: | |
| | **Covenant-72B-Chat** | 72B | 64.16 | 85.52 | 63.91 | 79.15 | 67.35 | 51.80 | 82.81 | 77.27 | |
| | LLaMA-2-7B-Chat | 7B | 53.16 | 80.64 | 22.59 | 78.60 | 47.23 | 42.60 | 78.24 | 72.45 | |
| | LLaMA-2-70B-Chat | 70B | 65.36 | 85.31 | 52.16 | 85.90 | 63.08 | 47.40 | 81.56 | 79.56 | |
| | K2-Chat (65B) | 65B | 61.95 | 85.82 | 79.00 | 79.31 | 67.87 | 48.20 | 83.35 | 79.64 | |
|
|
| _\*strict; \*\*acc. All others use acc_norm._ |
|
|
| ### Additional Benchmarks |
|
|
| | Model | Size | BBH CoT\* | IFEval\*\* | MATH\* | MMLU-Pro\* | MuSR | |
| | :-------------------- | ---: | --------: | ---------: | -----: | ---------: | ----: | |
| | **Covenant-72B-Chat** | 72B | 54.97 | 64.70 | 26.28 | 40.91 | 39.68 | |
| | LLaMA-2-7B-Chat | 7B | 40.42 | 30.87 | 4.82 | 22.88 | 40.21 | |
| | LLaMA-2-70B-Chat | 70B | 63.22 | 40.67 | 10.66 | 35.20 | 48.68 | |
| | K2-Chat (65B) | 65B | 69.79 | 45.47 | 19.06 | 45.36 | 46.56 | |
|
|
| _\*exact_match; \*\*prompt_strict. MuSR uses acc_norm._ |
|
|