File size: 2,745 Bytes
dabf08d
 
 
 
6db2af4
dabf08d
 
 
 
 
 
 
6db2af4
dabf08d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# Templar-I: Permissionless Distributed Training

> A 1.2B-parameter causal language model trained with **Gauntlet**, an incentive system that rewards permissionless contributors for useful pseudo-gradients on the Bittensor network. [[Paper]](https://arxiv.org/abs/2505.21684)

---
## Overview

* **Setting:** Fully open, permissionless, internet-scale training; no control over who registers or their hardware.
* **Mechanism:** Two-stage peer filtering (uptime/reliability/sync) + scoring per-peer gradient quality.
* **Run:** 20K communication rounds; FineWebEdu data; top **15** peers aggregated per round with up to 250 registered peers.
* **Result:** On a per-iteration basis, convergence outpaced a centralized AdamW baseline; downstream metrics are competitive.

---

## Gauntlet

* **Stage 1:** Filters peers by uptime, reliability, and synchronization.
* **Stage 2:** Estimates loss before/after applying each peer’s pseudo-gradients to evaluate its contribution.
* **Ratings:** Uses **OpenSkill** to track competitiveness across time.
* **Aggregation:** In each round, aggregate updates from the top-scoring **G=15** peers.

---

## Training setup

* **Data:** FineWeb-edu \[11].
* **Rounds:** 20,000 communication rounds (evaluation windows matched rounds).
* **Tokens:** 100-200B 
* **Baseline for comparison:** Centralized AdamW trained for 120B tokens.

---

## Quickstart

```python

from transformers import AutoTokenizer, AutoModelForCausalLM

import torch



model_id = "tplr/TEMPLAR-I"

tok = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

```

---

## Results
### Downstream Benchmarks (zero-shot)
| Model           | Dataset     | Tokens     | HellaSwag (acc_norm) | PIQA (acc_norm) | ARC-E (acc) |
|-----------------|-------------|------------|----------------------:|----------------:|------------:|
| TEMPLAR-1B      | FineWebEdu  | 100B–200B  |                 51.0  |            71.4 |        59.2 |
| DeMo 1B [12]    | Dolmo       | 100B       |                 48.0  |            70.0 |        55.0 |
| AdamW DDP 1B    | FineWebEdu  | 120B       |                 51.0  |            71.9 |        58.9 |

### Per-Iteration Loss
![Training loss](./figures/per_iteration_loss.png)

---

## Citation

If you use this model or Gauntlet, please cite it as follows:

```

@article{lidin2025incentivizing,

  title={Incentivizing Permissionless Distributed Learning of LLMs},

  author={Lidin, Joel and Sarfi, Amir and Pappas, Evangelos and Dare, Samuel and Belilovsky, Eugene and Steeves, Jacob},

  journal={arXiv preprint arXiv:2505.21684},

  year={2025}

}

```