--- license: apache-2.0 language: - en pipeline_tag: text-generation --- # Covenant-72B-Chat ## Model Overview **Covenant-72B-Chat** is the instruction-tuned variant of [Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B), the largest permissionless collaboratively trained language model. It was fine-tuned via supervised fine-tuning (SFT) on top of the 72B-parameter base model. For more details, see the [technical report](https://arxiv.org/abs/2603.08163). ## Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "1Covenant/Covenant-72B-Chat", torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat") messages = [ {"role": "user", "content": "Explain general relativity in simple terms."}, ] input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device) output_ids = model.generate(input_ids, max_new_tokens=256) print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)) ``` ## Model Details - **Base Model**: [Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B) - **Fine-tuning**: Supervised fine-tuning (SFT) - **Model License**: Apache 2.0 ## Technical Specifications | Parameter | Value | | ------------------------- | ------------------------------ | | Parameter Size | 72B | | Architecture | LLaMA-style (LlamaForCausalLM) | | Number of Layers | 80 | | Number of Attention Heads | 64 (8 KV heads) | | Hidden Size | 8192 | | Intermediate Size | 28672 | | Head Dimension | 128 | | Vocabulary Size | 262,144 | ## Performance on Benchmarks _All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot._ | Model | Size | ARC-C | ARC-E | GSM8K\* | HellaSwag | MMLU\*\* | OBQA | PIQA | WinoGrande\*\* | | :-------------------- | ---: | ----: | ----: | ------: | --------: | -------: | ----: | ----: | -------------: | | **Covenant-72B-Chat** | 72B | 64.16 | 85.52 | 63.91 | 79.15 | 67.35 | 51.80 | 82.81 | 77.27 | | LLaMA-2-7B-Chat | 7B | 53.16 | 80.64 | 22.59 | 78.60 | 47.23 | 42.60 | 78.24 | 72.45 | | LLaMA-2-70B-Chat | 70B | 65.36 | 85.31 | 52.16 | 85.90 | 63.08 | 47.40 | 81.56 | 79.56 | | K2-Chat (65B) | 65B | 61.95 | 85.82 | 79.00 | 79.31 | 67.87 | 48.20 | 83.35 | 79.64 | _\*strict; \*\*acc. All others use acc_norm._ ### Additional Benchmarks | Model | Size | BBH CoT\* | IFEval\*\* | MATH\* | MMLU-Pro\* | MuSR | | :-------------------- | ---: | --------: | ---------: | -----: | ---------: | ----: | | **Covenant-72B-Chat** | 72B | 54.97 | 64.70 | 26.28 | 40.91 | 39.68 | | LLaMA-2-7B-Chat | 7B | 40.42 | 30.87 | 4.82 | 22.88 | 40.21 | | LLaMA-2-70B-Chat | 70B | 63.22 | 40.67 | 10.66 | 35.20 | 48.68 | | K2-Chat (65B) | 65B | 69.79 | 45.47 | 19.06 | 45.36 | 46.56 | _\*exact_match; \*\*prompt_strict. MuSR uses acc_norm._