Revamp README model card

Browse files

Files changed (3) hide show

.gitignore +1 -0
README.md +67 -30
assets/covenant-72b.webp +3 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ CLAUDE.md

README.md CHANGED Viewed

@@ -2,49 +2,86 @@
 license: apache-2.0
 datasets:
   - mlfoundations/dclm-baseline-1.0-parquet
 ---
 # Covenant-72B
 **Covenant-72B** is the largest permissionless collaboratively trained language
-model trained entirely from scratch at the 72 billion parameter scale.
-It was trained with 20+ globally distributed participants coordinated via
-decentralized infrastructure on the Bittensor blockchain.
-![Covenant-72B](assets/base.webp)
----
-## Training Details
-| Property                            | Value                                          |
-| ----------------------------------- | ---------------------------------------------- |
-| **Model size**                      | 72B                                            |
-| **Architecture**                    | LLaMA-style                                    |
-| **Target token budget**             | 1.1T                                           |
-| **Compute participants**            | 20+                                            |
-| **Minimal compute per participant** | 8×B200 or equivalent                           |
-| **Dataset**                         | DCLM-baseline                                  |
-| **Optimizer**                       | SparseLoCo (communication-efficient optimizer) |
----
-## Performance on Benchmarks
-_All results are 0-shot acc-norm (%) unless noted._
-| Model                             | Compute Environment / Permissions | Size |   Tokens |     ARC-C |     ARC-E |      PIQA | OpenBookQA | HellaSwag | Winogrande (acc) | MMLU (acc) |
-| :-------------------------------- | :-------------------------------- | ---: | -------: | --------: | --------: | --------: | ---------: | --------: | ---------------: | ---------: |
-| **Intellect-1**                   | Internet / Whitelist              |  10B |       1T |     44.80 |     71.76 |     77.37 |      43.80 |     70.26 |            63.30 |      32.69 |
-| **Psyche Consilience-7Y9**        | Internet / Whitelist              |  40B |     1.2T |     31.14 |     55.77 |     76.12 |      35.20 |     63.67 |            56.99 |      24.23 |
-| **Covenant-72B (Checkpoint-Two)** | Internet / Permissionless         |  72B | **420B** | **53.84** | **77.74** | **80.58** |  **44.60** | **77.08** |        **71.43** |  **47.49** |
-| **Covenant-72B (base)**           | Internet / Permissionless         |  72B | **1.1T** | **56.48** | **79.76** | **80.90** |  **44.80** | **78.07** |        **73.24** |  **61.00** |
-| **LLM360 K2 ckpt_108**            | Centralized Cluster               |  65B |     420B |     45.73 |     70.54 |     80.90 |      43.20 |     78.23 |            71.90 |      50.01 |
-| **LLM360 K2 Stage 1**             | Centralized Cluster               |  65B |     1.4T |     53.75 |     75.97 |     82.54 |      48.00 |     82.86 |            76.40 |      65.51 |
-| **LLaMA-2-7B**                    | Centralized Cluster               |   7B |       2T |     45.05 |     73.82 |     78.73 |      44.20 |     76.18 |            69.38 |      41.73 |
-| **LLaMA-2-70B**                   | Centralized Cluster               |  70B |       2T |     57.42 |     79.55 |     82.59 |      49.40 |     84.34 |            80.43 |      65.63 |
----
-<!-- For more details, refer to [Checkpoint One on Templar Research](https://templarresearch.substack.com/p/checkpoint-one). -->

 license: apache-2.0
 datasets:
   - mlfoundations/dclm-baseline-1.0-parquet
+language:
+  - en
+pipeline_tag: text-generation
 ---
 # Covenant-72B
+## Model Overview
 **Covenant-72B** is the largest permissionless collaboratively trained language
+model, trained entirely from scratch at the 72 billion parameter scale on 1.1
+trillion tokens of English text.
+![Covenant-72B](assets/covenant-72b.webp)
+This is a base model. See [Covenant-72B-Chat](https://huggingface.co/1Covenant/Covenant-72B-Chat) for the instruction-tuned variant.
+**Covenant-72B** was trained with 20+ globally distributed participants
+coordinated via decentralized infrastructure on the Bittensor blockchain.
+Unlike prior collaborative training efforts that use whitelisted compute,
+Covenant-72B is the first to achieve this scale with fully permissionless
+participation. Training used the SparseLoCo communication-efficient optimizer
+to reduce bandwidth requirements across distributed nodes.
+## Usage
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "1Covenant/Covenant-72B",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B")
+input_text = "The theory of general relativity"
+input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
+output_ids = model.generate(input_ids, max_new_tokens=100)
+print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
+```
+## Model Details
+- **Compute Participants**: 20+ independent contributors on Bittensor
+- **Minimum Compute per Participant**: 8×B200 or equivalent
+- **Model License**: Apache 2.0
+## Technical Specifications
+| Parameter                 | Value                          |
+| ------------------------- | ------------------------------ |
+| Parameter Size            | 72B                            |
+| Architecture              | LLaMA-style (LlamaForCausalLM) |
+| Number of Layers          | 80                             |
+| Number of Attention Heads | 64 (8 KV heads)                |
+| Hidden Size               | 8192                           |
+| Intermediate Size         | 28672                          |
+| Head Dimension            | 128                            |
+| Vocabulary Size           | 262,144                        |
+**Training Details**:
+- **Dataset**: [DCLM-baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0-parquet)
+- **Tokens**: 1.1 Trillion
+- **Optimizer**: SparseLoCo (communication-efficient optimizer)
+## Performance on Benchmarks
+_All results are 0-shot acc_norm (%) unless noted._
+| Model              | Size | Tokens | ARC-C | ARC-E |  PIQA |  OBQA | HellaSwag | WinoGrande\* | MMLU\* |
+| :----------------- | ---: | -----: | ----: | ----: | ----: | ----: | --------: | -----------: | -----: |
+| **Covenant-72B**   |  72B |   1.1T | 56.83 | 80.93 | 81.56 | 44.00 |     80.61 |        75.85 |  67.11 |
+| INTELLECT-1        |  10B |     1T | 44.80 | 71.76 | 77.37 | 43.80 |     70.26 |        63.30 |  32.69 |
+| Psyche Consilience |  40B |   1.2T | 31.14 | 55.77 | 76.12 | 35.20 |     63.67 |        56.99 |  24.23 |
+| LLM360 K2 ckpt_108 |  65B |   420B | 45.73 | 70.54 | 80.90 | 43.20 |     78.23 |        71.90 |  50.01 |
+| LLM360 K2          |  65B |   1.4T | 53.75 | 75.97 | 82.54 | 48.00 |     82.86 |        76.40 |  65.51 |
+| LLaMA-2-7B         |   7B |     2T | 45.05 | 73.82 | 78.73 | 44.20 |     76.18 |        69.38 |  41.73 |
+| LLaMA-2-70B        |  70B |     2T | 57.42 | 79.55 | 82.59 | 49.40 |     84.34 |        80.43 |  65.63 |
+_\*WinoGrande uses acc; MMLU uses acc._

assets/covenant-72b.webp ADDED Viewed

Git LFS Details

SHA256: eddf89900f8ea3c119fe17dff757f41913cda8ada05af83145c656c888132871
Pointer size: 131 Bytes
Size of remote file: 884 kB