Text Generation
Safetensors
English
llama
joellidin commited on
Commit
2c8e2b0
·
verified ·
1 Parent(s): c64f87b

Revamp README model card

Browse files
Files changed (3) hide show
  1. .gitignore +1 -0
  2. README.md +67 -30
  3. assets/covenant-72b.webp +3 -0
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ CLAUDE.md
README.md CHANGED
@@ -2,49 +2,86 @@
2
  license: apache-2.0
3
  datasets:
4
  - mlfoundations/dclm-baseline-1.0-parquet
 
 
 
5
  ---
6
 
7
  # Covenant-72B
8
 
 
 
9
  **Covenant-72B** is the largest permissionless collaboratively trained language
10
- model trained entirely from scratch at the 72 billion parameter scale.
 
11
 
12
- It was trained with 20+ globally distributed participants coordinated via
13
- decentralized infrastructure on the Bittensor blockchain.
14
 
15
- ![Covenant-72B](assets/base.webp)
16
 
17
- ---
 
 
 
 
 
18
 
19
- ## Training Details
20
 
21
- | Property | Value |
22
- | ----------------------------------- | ---------------------------------------------- |
23
- | **Model size** | 72B |
24
- | **Architecture** | LLaMA-style |
25
- | **Target token budget** | 1.1T |
26
- | **Compute participants** | 20+ |
27
- | **Minimal compute per participant** | 8×B200 or equivalent |
28
- | **Dataset** | DCLM-baseline |
29
- | **Optimizer** | SparseLoCo (communication-efficient optimizer) |
30
 
31
- ---
 
 
 
 
 
32
 
33
- ## Performance on Benchmarks
 
 
 
 
34
 
35
- _All results are 0-shot acc-norm (%) unless noted._
36
 
37
- | Model | Compute Environment / Permissions | Size | Tokens | ARC-C | ARC-E | PIQA | OpenBookQA | HellaSwag | Winogrande (acc) | MMLU (acc) |
38
- | :-------------------------------- | :-------------------------------- | ---: | -------: | --------: | --------: | --------: | ---------: | --------: | ---------------: | ---------: |
39
- | **Intellect-1** | Internet / Whitelist | 10B | 1T | 44.80 | 71.76 | 77.37 | 43.80 | 70.26 | 63.30 | 32.69 |
40
- | **Psyche Consilience-7Y9** | Internet / Whitelist | 40B | 1.2T | 31.14 | 55.77 | 76.12 | 35.20 | 63.67 | 56.99 | 24.23 |
41
- | **Covenant-72B (Checkpoint-Two)** | Internet / Permissionless | 72B | **420B** | **53.84** | **77.74** | **80.58** | **44.60** | **77.08** | **71.43** | **47.49** |
42
- | **Covenant-72B (base)** | Internet / Permissionless | 72B | **1.1T** | **56.48** | **79.76** | **80.90** | **44.80** | **78.07** | **73.24** | **61.00** |
43
- | **LLM360 K2 ckpt_108** | Centralized Cluster | 65B | 420B | 45.73 | 70.54 | 80.90 | 43.20 | 78.23 | 71.90 | 50.01 |
44
- | **LLM360 K2 Stage 1** | Centralized Cluster | 65B | 1.4T | 53.75 | 75.97 | 82.54 | 48.00 | 82.86 | 76.40 | 65.51 |
45
- | **LLaMA-2-7B** | Centralized Cluster | 7B | 2T | 45.05 | 73.82 | 78.73 | 44.20 | 76.18 | 69.38 | 41.73 |
46
- | **LLaMA-2-70B** | Centralized Cluster | 70B | 2T | 57.42 | 79.55 | 82.59 | 49.40 | 84.34 | 80.43 | 65.63 |
47
 
48
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
- <!-- For more details, refer to [Checkpoint One on Templar Research](https://templarresearch.substack.com/p/checkpoint-one). -->
 
2
  license: apache-2.0
3
  datasets:
4
  - mlfoundations/dclm-baseline-1.0-parquet
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
  ---
9
 
10
  # Covenant-72B
11
 
12
+ ## Model Overview
13
+
14
  **Covenant-72B** is the largest permissionless collaboratively trained language
15
+ model, trained entirely from scratch at the 72 billion parameter scale on 1.1
16
+ trillion tokens of English text.
17
 
18
+ ![Covenant-72B](assets/covenant-72b.webp)
 
19
 
20
+ This is a base model. See [Covenant-72B-Chat](https://huggingface.co/1Covenant/Covenant-72B-Chat) for the instruction-tuned variant.
21
 
22
+ **Covenant-72B** was trained with 20+ globally distributed participants
23
+ coordinated via decentralized infrastructure on the Bittensor blockchain.
24
+ Unlike prior collaborative training efforts that use whitelisted compute,
25
+ Covenant-72B is the first to achieve this scale with fully permissionless
26
+ participation. Training used the SparseLoCo communication-efficient optimizer
27
+ to reduce bandwidth requirements across distributed nodes.
28
 
29
+ ## Usage
30
 
31
+ ```python
32
+ import torch
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
34
 
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ "1Covenant/Covenant-72B",
37
+ torch_dtype=torch.bfloat16,
38
+ device_map="auto",
39
+ )
40
+ tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B")
41
 
42
+ input_text = "The theory of general relativity"
43
+ input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
44
+ output_ids = model.generate(input_ids, max_new_tokens=100)
45
+ print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
46
+ ```
47
 
48
+ ## Model Details
49
 
50
+ - **Compute Participants**: 20+ independent contributors on Bittensor
51
+ - **Minimum Compute per Participant**: 8×B200 or equivalent
52
+ - **Model License**: Apache 2.0
 
 
 
 
 
 
 
53
 
54
+ ## Technical Specifications
55
+
56
+ | Parameter | Value |
57
+ | ------------------------- | ------------------------------ |
58
+ | Parameter Size | 72B |
59
+ | Architecture | LLaMA-style (LlamaForCausalLM) |
60
+ | Number of Layers | 80 |
61
+ | Number of Attention Heads | 64 (8 KV heads) |
62
+ | Hidden Size | 8192 |
63
+ | Intermediate Size | 28672 |
64
+ | Head Dimension | 128 |
65
+ | Vocabulary Size | 262,144 |
66
+
67
+ **Training Details**:
68
+
69
+ - **Dataset**: [DCLM-baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0-parquet)
70
+ - **Tokens**: 1.1 Trillion
71
+ - **Optimizer**: SparseLoCo (communication-efficient optimizer)
72
+
73
+ ## Performance on Benchmarks
74
+
75
+ _All results are 0-shot acc_norm (%) unless noted._
76
+
77
+ | Model | Size | Tokens | ARC-C | ARC-E | PIQA | OBQA | HellaSwag | WinoGrande\* | MMLU\* |
78
+ | :----------------- | ---: | -----: | ----: | ----: | ----: | ----: | --------: | -----------: | -----: |
79
+ | **Covenant-72B** | 72B | 1.1T | 56.83 | 80.93 | 81.56 | 44.00 | 80.61 | 75.85 | 67.11 |
80
+ | INTELLECT-1 | 10B | 1T | 44.80 | 71.76 | 77.37 | 43.80 | 70.26 | 63.30 | 32.69 |
81
+ | Psyche Consilience | 40B | 1.2T | 31.14 | 55.77 | 76.12 | 35.20 | 63.67 | 56.99 | 24.23 |
82
+ | LLM360 K2 ckpt_108 | 65B | 420B | 45.73 | 70.54 | 80.90 | 43.20 | 78.23 | 71.90 | 50.01 |
83
+ | LLM360 K2 | 65B | 1.4T | 53.75 | 75.97 | 82.54 | 48.00 | 82.86 | 76.40 | 65.51 |
84
+ | LLaMA-2-7B | 7B | 2T | 45.05 | 73.82 | 78.73 | 44.20 | 76.18 | 69.38 | 41.73 |
85
+ | LLaMA-2-70B | 70B | 2T | 57.42 | 79.55 | 82.59 | 49.40 | 84.34 | 80.43 | 65.63 |
86
 
87
+ _\*WinoGrande uses acc; MMLU uses acc._
assets/covenant-72b.webp ADDED

Git LFS Details

  • SHA256: eddf89900f8ea3c119fe17dff757f41913cda8ada05af83145c656c888132871
  • Pointer size: 131 Bytes
  • Size of remote file: 884 kB