File size: 7,614 Bytes
e343b1c
 
 
 
 
 
 
 
 
915f224
 
d269b58
915f224
 
 
 
 
 
 
 
 
 
 
 
 
9e49e5f
 
 
915f224
9e49e5f
915f224
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e49e5f
 
915f224
 
 
 
 
 
9e49e5f
 
 
 
 
 
 
 
 
 
 
 
a9da3b3
869fec5
9e49e5f
 
 
915f224
869fec5
9e49e5f
 
 
915f224
 
9e49e5f
 
 
 
 
 
 
 
 
a9da3b3
9e49e5f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
915f224
 
 
46d9b28
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: README
emoji: ๐Ÿ“ˆ
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
---

# ๐ŸŒผ DaisyChainAI

We build capable systems by *daisy-chaining* a handful of
small, sharp specialists behind a learned router โ€” instead of training one giant model to do
everything. Each specialist is cheap, swappable, and crisp on its own domain; chained together,
they behave like one model at a fraction of the active compute.

---

## ๐Ÿ”— What "daisy-chaining" means

A **daisy chain** links independent units in series so a signal can flow from one to the next,
each unit handling what it's good at and passing the rest along. That's exactly how our systems work:

- **Each link is one small specialist** โ€” a dense ~74M model trained on a *single* domain. It is
  excellent at its own data and (deliberately) surprised by everything else.
- **The router is the connector between links.** When an input arrives, every specialist reports how
  *surprised* it is (bits/base) and exposes its hidden state, and a tiny learned router hands the work
  to the link that's most at home with it.
- **The chain grows link by link.** Because the specialists are trained *separately*, you can chain a
  new domain on without retraining the others โ€” add a link, extend the router, done.
- **One link runs per query.** Only the routed specialist computes, so a chain of four ~74M experts
  costs ~74M of compute per token โ€” roughly **7ร— cheaper** than a 500M monolith of comparable scope.

So "DaisyChain" is both the brand and the mechanism: **a chain of specialists, connected by routing,
that you extend one flower at a time.**

---

## ๐Ÿ› ๏ธ How the models are built

Each specialist is grown by **interleaving two steps**, per domain:

1. **Continued pretraining** โ€” next-token training on *only* that domain's data, so the specialist
   becomes genuinely crisp on its home distribution (and the router can tell the links apart).
2. **Per-domain distillation** โ€” the specialist is distilled from a larger teacher foundation model
   *restricted to its own domain* (soft-target KD, plus a factorized per-nucleotide variant where the
   teacher supports it). It learns the teacher's behavior on its slice without ever becoming a generic
   clone โ€” the specialization is what makes routing work.

We iterate those two steps until each link is as strong as its capacity allows, then train the
**router**. In lineage this is a **cluster Branch-Train-Merge (cBTM)** mixture of domain experts โ€”
independent experts + perplexity-aware routing โ€” with iterative distillation from a larger teacher.

---

## ๐Ÿงฌ Current project โ€” DaisyChain Genomics

Four DNA/RNA specialists (**eukaryote ยท prokaryote ยท mRNA ยท mRNA-splice**, ~74M each, **โ‰ˆ295M total โ€”
under 500M**), each distilled per-domain from **[Carbon-500M](https://huggingface.co/HuggingFaceBio/Carbon-500M)**
behind a learned router. Carbon's domain mixture (50% eukaryotic / 25% mRNA / 10% splice / 15% bacterial)
maps one-to-one onto our four specialists.

### Where it actually stands (measured on Carbon's own base-pair / FNS metric)

We score likelihood the way Carbon does โ€” marginalizing each 6-mer into six per-base distributions and
taking mean per-base log-prob (`score_sequence`). Our implementation reproduces Carbon's `compute_bp_probs`
to **6e-08**, so these are apples-to-apples.

| | DaisyChain | Carbon-500M |
|---|---|---|
| **Routing accuracy** (held-out) | **100.0%** | โ€” |
| **Likelihood โ€” base-pair bits/base** (โ†“) | **1.875** | **1.787** |
| Seq-recovery, eukaryote (FNS, โ†‘) | 31.5% | 38.9% |
| Seq-recovery, bacteria (FNS, โ†‘) | 40.9% | 54.1% |
| Active params / query | ~74M (one specialist) | 500M |

**Honest standing: ~+0.088 bits/base behind, and no single domain beats Carbon yet.** The gap is
concentrated in mRNA and bacterial DNA (Carbon's strongest domains); eukaryote and splice are closest.
Note Carbon-500M is itself a *draft model*, explicitly "not designed to be competitive on downstream
benchmarks" โ€” so it's a fair, achievable target, not the 3B/8B flagships.

- ๐Ÿ“ฆ **Model:** [`DaisyChainAI/daisychain-genomics`](https://huggingface.co/DaisyChainAI/daisychain-genomics)
- ๐ŸŽฎ **Live demo:** [`Daisychain-Genomics-Demo`](https://huggingface.co/spaces/DaisyChainAI/Daisychain-Genomics-Demo) โ€” paste DNA, watch the chain light up specialist-by-specialist and route in real time, then generate with Carbon's base-pair (FNS) decoder.

---

## ๐Ÿ““ Build log โ€” what we got right, and what we got wrong

We build in the open, mistakes included. This project's honest history:

**What worked**
- **Per-domain specialists + a learned router** reached **100%** held-out routing โ€” one ~74M model active per query.
- **Snapshot-then-pick-best** distillation: snapshot every few thousand steps, deploy the snapshot with the
  best *held-out* score, never the last one. This caught over-distillation (models that memorize the distill
  cache and regress on held-out data) and made every round regression-guarded.
- **Re-fitting the router after every specialist swap.** Router features are coupled to the checkpoints;
  skipping the re-fit once produced a fake "regression" that was pure routing drift.
- **FNS per-base distillation targets** โ€” distilling the teacher's *base-pair* marginals, not the 4096-way
  6-mer distribution, gave the small students a tractable, base-pair-correct objective.

**What we got wrong (and corrected)**
- **We reported the wrong metric for days.** We measured likelihood as **6-mer cross-entropy** (a softer proxy)
  instead of Carbon's **base-pair (FNS)** score. The proxy flattered us: it showed ~+0.043 behind and even
  "splice beats Carbon." On Carbon's actual metric the gap is **+0.089 and no domain is ahead.** We re-baselined
  the entire project history on the real metric.
- **We measured sequence recovery with the wrong decoder** (6-mer argmax) instead of Carbon's **FNS base-level
  argmax**. Re-measuring with their decoder changed the numbers (and actually *raised* our bacteria recovery).
- **An early eval had a frame-alignment bug** โ€” feeding a context length not divisible by 6 knocked our 6-mer
  model out of phase and produced an impossible near-zero recovery. Fixed by aligning context to the 6-mer grid.
- **Decoding took several wrong turns** before matching Carbon: greedy with no repetition control (collapsed to
  homopolymers), then top-k sampling (trapped on low-complexity GC/AT loops), before adopting Carbon's actual
  **base-pair FNS decoder** (top-p at the 6-mer level โ†’ per-base selection).
- **One training round improved the proxy while regressing the real metric** (an early mRNA distill-only pass)
  โ€” invisible on 6-mer CE, obvious on base-pair. A later base+distill round fixed it.

**The lesson:** *measure the way the baseline measures, or you aren't comparing anything.* A stricter, honest
evaluation didn't sink the project โ€” it pointed to exactly which domains to attack and which "wins" were illusions.

More links on the chain โ€” and more chains โ€” coming. ๐ŸŒผ

## Citation

**If you use these models, please cite the author โ€” Dean Byrne (Quazim0t0):**

```bibtex
@misc{byrne2026daisychain,
  title        = {DaisyChain Genomics: A Modular Mixture of Per-Domain Distilled Genomic Specialists},
  author       = {Byrne, Dean},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/DaisyChainAI/daisychain-genomics}},
  note         = {DaisyChainAI (Quazim0t0). Four ~74M DNA/RNA specialists distilled per-domain
                  from Carbon-500M behind a learned router}
}
```