Revamp README model card
Browse files- .gitignore +1 -0
- README.md +67 -30
- assets/covenant-72b.webp +3 -0
.gitignore
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
CLAUDE.md
|
README.md
CHANGED
|
@@ -2,49 +2,86 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
- mlfoundations/dclm-baseline-1.0-parquet
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Covenant-72B
|
| 8 |
|
|
|
|
|
|
|
| 9 |
**Covenant-72B** is the largest permissionless collaboratively trained language
|
| 10 |
-
model trained entirely from scratch at the 72 billion parameter scale.
|
|
|
|
| 11 |
|
| 12 |
-
|
| 13 |
-
decentralized infrastructure on the Bittensor blockchain.
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
##
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
| **Architecture** | LLaMA-style |
|
| 25 |
-
| **Target token budget** | 1.1T |
|
| 26 |
-
| **Compute participants** | 20+ |
|
| 27 |
-
| **Minimal compute per participant** | 8×B200 or equivalent |
|
| 28 |
-
| **Dataset** | DCLM-baseline |
|
| 29 |
-
| **Optimizer** | SparseLoCo (communication-efficient optimizer) |
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
| **Psyche Consilience-7Y9** | Internet / Whitelist | 40B | 1.2T | 31.14 | 55.77 | 76.12 | 35.20 | 63.67 | 56.99 | 24.23 |
|
| 41 |
-
| **Covenant-72B (Checkpoint-Two)** | Internet / Permissionless | 72B | **420B** | **53.84** | **77.74** | **80.58** | **44.60** | **77.08** | **71.43** | **47.49** |
|
| 42 |
-
| **Covenant-72B (base)** | Internet / Permissionless | 72B | **1.1T** | **56.48** | **79.76** | **80.90** | **44.80** | **78.07** | **73.24** | **61.00** |
|
| 43 |
-
| **LLM360 K2 ckpt_108** | Centralized Cluster | 65B | 420B | 45.73 | 70.54 | 80.90 | 43.20 | 78.23 | 71.90 | 50.01 |
|
| 44 |
-
| **LLM360 K2 Stage 1** | Centralized Cluster | 65B | 1.4T | 53.75 | 75.97 | 82.54 | 48.00 | 82.86 | 76.40 | 65.51 |
|
| 45 |
-
| **LLaMA-2-7B** | Centralized Cluster | 7B | 2T | 45.05 | 73.82 | 78.73 | 44.20 | 76.18 | 69.38 | 41.73 |
|
| 46 |
-
| **LLaMA-2-70B** | Centralized Cluster | 70B | 2T | 57.42 | 79.55 | 82.59 | 49.40 | 84.34 | 80.43 | 65.63 |
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
-
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
- mlfoundations/dclm-baseline-1.0-parquet
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
---
|
| 9 |
|
| 10 |
# Covenant-72B
|
| 11 |
|
| 12 |
+
## Model Overview
|
| 13 |
+
|
| 14 |
**Covenant-72B** is the largest permissionless collaboratively trained language
|
| 15 |
+
model, trained entirely from scratch at the 72 billion parameter scale on 1.1
|
| 16 |
+
trillion tokens of English text.
|
| 17 |
|
| 18 |
+

|
|
|
|
| 19 |
|
| 20 |
+
This is a base model. See [Covenant-72B-Chat](https://huggingface.co/1Covenant/Covenant-72B-Chat) for the instruction-tuned variant.
|
| 21 |
|
| 22 |
+
**Covenant-72B** was trained with 20+ globally distributed participants
|
| 23 |
+
coordinated via decentralized infrastructure on the Bittensor blockchain.
|
| 24 |
+
Unlike prior collaborative training efforts that use whitelisted compute,
|
| 25 |
+
Covenant-72B is the first to achieve this scale with fully permissionless
|
| 26 |
+
participation. Training used the SparseLoCo communication-efficient optimizer
|
| 27 |
+
to reduce bandwidth requirements across distributed nodes.
|
| 28 |
|
| 29 |
+
## Usage
|
| 30 |
|
| 31 |
+
```python
|
| 32 |
+
import torch
|
| 33 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 36 |
+
"1Covenant/Covenant-72B",
|
| 37 |
+
torch_dtype=torch.bfloat16,
|
| 38 |
+
device_map="auto",
|
| 39 |
+
)
|
| 40 |
+
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B")
|
| 41 |
|
| 42 |
+
input_text = "The theory of general relativity"
|
| 43 |
+
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
|
| 44 |
+
output_ids = model.generate(input_ids, max_new_tokens=100)
|
| 45 |
+
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
|
| 46 |
+
```
|
| 47 |
|
| 48 |
+
## Model Details
|
| 49 |
|
| 50 |
+
- **Compute Participants**: 20+ independent contributors on Bittensor
|
| 51 |
+
- **Minimum Compute per Participant**: 8×B200 or equivalent
|
| 52 |
+
- **Model License**: Apache 2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
## Technical Specifications
|
| 55 |
+
|
| 56 |
+
| Parameter | Value |
|
| 57 |
+
| ------------------------- | ------------------------------ |
|
| 58 |
+
| Parameter Size | 72B |
|
| 59 |
+
| Architecture | LLaMA-style (LlamaForCausalLM) |
|
| 60 |
+
| Number of Layers | 80 |
|
| 61 |
+
| Number of Attention Heads | 64 (8 KV heads) |
|
| 62 |
+
| Hidden Size | 8192 |
|
| 63 |
+
| Intermediate Size | 28672 |
|
| 64 |
+
| Head Dimension | 128 |
|
| 65 |
+
| Vocabulary Size | 262,144 |
|
| 66 |
+
|
| 67 |
+
**Training Details**:
|
| 68 |
+
|
| 69 |
+
- **Dataset**: [DCLM-baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0-parquet)
|
| 70 |
+
- **Tokens**: 1.1 Trillion
|
| 71 |
+
- **Optimizer**: SparseLoCo (communication-efficient optimizer)
|
| 72 |
+
|
| 73 |
+
## Performance on Benchmarks
|
| 74 |
+
|
| 75 |
+
_All results are 0-shot acc_norm (%) unless noted._
|
| 76 |
+
|
| 77 |
+
| Model | Size | Tokens | ARC-C | ARC-E | PIQA | OBQA | HellaSwag | WinoGrande\* | MMLU\* |
|
| 78 |
+
| :----------------- | ---: | -----: | ----: | ----: | ----: | ----: | --------: | -----------: | -----: |
|
| 79 |
+
| **Covenant-72B** | 72B | 1.1T | 56.83 | 80.93 | 81.56 | 44.00 | 80.61 | 75.85 | 67.11 |
|
| 80 |
+
| INTELLECT-1 | 10B | 1T | 44.80 | 71.76 | 77.37 | 43.80 | 70.26 | 63.30 | 32.69 |
|
| 81 |
+
| Psyche Consilience | 40B | 1.2T | 31.14 | 55.77 | 76.12 | 35.20 | 63.67 | 56.99 | 24.23 |
|
| 82 |
+
| LLM360 K2 ckpt_108 | 65B | 420B | 45.73 | 70.54 | 80.90 | 43.20 | 78.23 | 71.90 | 50.01 |
|
| 83 |
+
| LLM360 K2 | 65B | 1.4T | 53.75 | 75.97 | 82.54 | 48.00 | 82.86 | 76.40 | 65.51 |
|
| 84 |
+
| LLaMA-2-7B | 7B | 2T | 45.05 | 73.82 | 78.73 | 44.20 | 76.18 | 69.38 | 41.73 |
|
| 85 |
+
| LLaMA-2-70B | 70B | 2T | 57.42 | 79.55 | 82.59 | 49.40 | 84.34 | 80.43 | 65.63 |
|
| 86 |
|
| 87 |
+
_\*WinoGrande uses acc; MMLU uses acc._
|
assets/covenant-72b.webp
ADDED
|
Git LFS Details
|