1Covenant
/

Covenant-72B-Chat

Text Generation

Model card Files Files and versions

Covenant-72B-Chat / README.md

joellidin's picture

Remove redundant base model link from README

2427638 verified 11 days ago

|

history blame contribute delete

3.49 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	---

	# Covenant-72B-Chat

	## Model Overview

	Covenant-72B-Chat is the instruction-tuned variant of
	[Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B), the largest
	permissionless collaboratively trained language model. It was fine-tuned via
	supervised fine-tuning (SFT) on top of the 72B-parameter base model.

	For more details, see the [technical report](https://arxiv.org/abs/2603.08163).

	## Usage

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"1Covenant/Covenant-72B-Chat",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat")

	messages = [
	{"role": "user", "content": "Explain general relativity in simple terms."},
	]
	input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
	output_ids = model.generate(input_ids, max_new_tokens=256)
	print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True))
	```

	## Model Details

	- Base Model: [Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B)
	- Fine-tuning: Supervised fine-tuning (SFT)
	- Model License: Apache 2.0

	## Technical Specifications

	\| Parameter \| Value \|
	\| ------------------------- \| ------------------------------ \|
	\| Parameter Size \| 72B \|
	\| Architecture \| LLaMA-style (LlamaForCausalLM) \|
	\| Number of Layers \| 80 \|
	\| Number of Attention Heads \| 64 (8 KV heads) \|
	\| Hidden Size \| 8192 \|
	\| Intermediate Size \| 28672 \|
	\| Head Dimension \| 128 \|
	\| Vocabulary Size \| 262,144 \|

	## Performance on Benchmarks

	_All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot._

	\| Model \| Size \| ARC-C \| ARC-E \| GSM8K\* \| HellaSwag \| MMLU\\ \| OBQA \| PIQA \| WinoGrande\\ \|
	\| :-------------------- \| ---: \| ----: \| ----: \| ------: \| --------: \| -------: \| ----: \| ----: \| -------------: \|
	\| Covenant-72B-Chat \| 72B \| 64.16 \| 85.52 \| 63.91 \| 79.15 \| 67.35 \| 51.80 \| 82.81 \| 77.27 \|
	\| LLaMA-2-7B-Chat \| 7B \| 53.16 \| 80.64 \| 22.59 \| 78.60 \| 47.23 \| 42.60 \| 78.24 \| 72.45 \|
	\| LLaMA-2-70B-Chat \| 70B \| 65.36 \| 85.31 \| 52.16 \| 85.90 \| 63.08 \| 47.40 \| 81.56 \| 79.56 \|
	\| K2-Chat (65B) \| 65B \| 61.95 \| 85.82 \| 79.00 \| 79.31 \| 67.87 \| 48.20 \| 83.35 \| 79.64 \|

	_\strict; \\*acc. All others use acc_norm._

	### Additional Benchmarks

	\| Model \| Size \| BBH CoT\* \| IFEval\\ \| MATH\* \| MMLU-Pro\* \| MuSR \|
	\| :-------------------- \| ---: \| --------: \| ---------: \| -----: \| ---------: \| ----: \|
	\| Covenant-72B-Chat \| 72B \| 54.97 \| 64.70 \| 26.28 \| 40.91 \| 39.68 \|
	\| LLaMA-2-7B-Chat \| 7B \| 40.42 \| 30.87 \| 4.82 \| 22.88 \| 40.21 \|
	\| LLaMA-2-70B-Chat \| 70B \| 63.22 \| 40.67 \| 10.66 \| 35.20 \| 48.68 \|
	\| K2-Chat (65B) \| 65B \| 69.79 \| 45.47 \| 19.06 \| 45.36 \| 46.56 \|

	_\exact_match; \\*prompt_strict. MuSR uses acc_norm._