tplr
/

Covenant72B

Model card Files Files and versions

Covenant72B / README.md

joellidin's picture

Update to Checkpoint-Two (420B tokens)

2633fe9 verified 2 months ago

|

history blame contribute delete

2.41 kB

	---
	license: apache-2.0
	datasets:
	- mlfoundations/dclm-baseline-1.0-parquet
	---

	# Covenant72B

	Covenant72B is the largest permissionless collaboratively trained language
	model trained entirely from scratch at the 72 billion parameter scale.

	It is being trained with 20+ globally distributed participants coordinated via
	decentralized infrastructure on the Bittensor blockchain.

	Checkpoint-Two marks the second release, corresponding to **420 billion
	tokens processed**. Model files are available in the [Checkpoint-Two
	branch](https://huggingface.co/tplr/Covenant72B/tree/Checkpoint-Two). Future
	checkpoints will be updated here.

	![Checkpoint Two](assets/checkpoint-two.webp)

	---

	## Training Details

	\| Property \| Value \|
	\|-----------\|--------\|
	\| Model size \| 72B \|
	\| Architecture \| LLaMA-style \|
	\| Target token budget \| 1.2T (420B for current checkpoint) \|
	\| Compute participants \| 20+ \|
	\| Minimal compute per participant \| 8×B200 or equivalent \|
	\| Dataset \| DCLM-baseline \|
	\| Optimizer \| SparseLoCo (communication-efficient optimizer) \|

	---

	## Performance on Benchmarks
	_All results are 0-shot acc-norm (%) unless noted._

	\| Model \| Compute Environment / Permissions \| Size \| Tokens \| ARC-C \| ARC-E \| PIQA \| OpenBookQA \| HellaSwag \| Winogrande (acc) \| MMLU (acc) \|
	\|:------\|:----------------------------------\|------:\|--------:\|------:\|------:\|------:\|------------:\|-----------:\|-------------:\|------:\|
	\| Intellect-1 \| Internet / Whitelist \| 10B \| 1T \| 44.8 \| 71.6 \| 77.7 \| 43.6 \| 70.5 \| 63.1 \| 32.7 \|
	\| Psyche Consilience-7Y9 \| Internet / Whitelist \| 40B \| 1.2T \| 31.1 \| 55.8 \| 76.1 \| 34.8 \| 63.7 \| 57.0 \| 24.2 \|
	\| Covenant72B (Checkpoint-Two) \| Internet / Permissionless \| 72B \| 420B \| 53.84 \| 77.74 \| 80.58 \| 44.60 \| 77.08 \| 71.43 \| 47.49 \|
	\| LLM360 K2 ckpt_108 \| Centralized Cluster \| 65B \| 420B \| 45.73 \| 70.54 \| 80.90 \| 43.20 \| 78.23 \| 71.90 \| 50.01 \|
	\| LLM360 K2 Stage 1 \| Centralized Cluster \| 65B \| 1.4T \| 53.84 \| 75.93 \| 82.48 \| 48.00 \| 82.81 \| 76.64 \| 63.90 \|
	\| LLaMA-2-7B \| Centralized Cluster \| 7B \| 2T \| 45.90 \| 74.58 \| 75.92 \| 44.20 \| 75.92 \| 68.90 \| 40.86 \|
	\| LLaMA-2-70B \| Centralized Cluster \| 70B \| 2T \| 57.59 \| 80.77 \| 82.92 \| 48.60 \| 83.86 \| 77.58 \| 65.56 \|

	---

	For more details, refer to [Checkpoint One on Templar Research](https://templarresearch.substack.com/p/checkpoint-one).