File size: 18,661 Bytes
d16f284
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
---
library_name: multimolecule
license: agpl-3.0
pipeline: mean-ribosome-load
pipeline_tag: other
tags:
- Biology
- RNA
- 5' UTR
- Translation
- rna
widget:
- example_title: microRNA 21
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: UAGCUUAUCAGACUGAUGUUGA
- example_title: microRNA 146a
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: UGAGAACUGAAUUCCAUGGGUU
- example_title: microRNA 155
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: UUAAUGCUAAUCGUGAUAGGGGUU
- example_title: RNA component of mitochondrial RNA processing endoribonuclease
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: GGUUCGUGCUGAAGGCCUGUAUCCUAGGCUACACACUGAGGACUCUGUUCCUCCCCUUUCCGCCUAGGGGAAAGUCCCCGGACCUCGGGCAGAGAGUGCCACGUGCAUACGCACGUAGACAUUCCCCGCUUCCCACUCCAAAGUCCGCCAAGAAGCGUAUCCCGCUGAGCGGCGUGGCGCGGGGGCGUCAUCCGUCAGCUCCCUCUAGUUACGCAGGCAGUGCGUGUCCGCGCACCAACCACACGGGGCUCAUUCUCAGCGCGGCUGUAAAAAAAAA
- example_title: 7SK small nuclear RNA
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: GGAUGUGAGGGCGAUCUGGCUGCGACAUCUGUCACCCCAUUGAUCGCCAGGGUUGAUUCGGCUGAUCUGGCUGGCUAGGCGGGUGUCCCCUUCCUCCCUCACCGCUCCAUGUGCGUCCCUCCCGAAGCUGCGCGCUCGGUCGAAGAGGACGACCAUCCCCGAUAGAGGAGGACCGGUCUUCGGUCAAGGGUAUACGAGUAGCUGCGCUCCCCUGCUAGAACCUCCAAACAAGCUCUCAAGGUCCAUUUGUAGGAGAACGUAGGGUAGUCAAGCUUCCAAGACUCCAGACACAUCCAAAUGAGGCGCUGCAUGUGGCAGUCUGCCUUUCUUUU
- example_title: telomerase RNA component
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: GGGUUGCGGAGGGUGGGCCUGGGAGGGGUGGUGGCCAUUUUUUGUCUAACCCUAACUGAGAAGGGCGUAGGCGCCGUGCUUUUGCUCCCCGCGCGCUGUUUUUCUCGCUGACUUUCAGCGGGCGGAAAAGCCUCGGCCUGCCGCCUUCCACCGUUCAUUCUAGAGCAAACAAAAAAUGUCAGCUGCUGGCCCGUUCGCCCCUCCCGGGGACCUGCGGCGGGUCGCCUGCCCAGCCCCCGAACCCCGCCUGGAGGCCGCGGUCGGCCCGGGGCUUCUCCGGAGGCACCCACUGCCACCGCGAAGAGUUGGGCUCUGUCAGCCGCGGGUCUCUCGGGGGCGAGGGCGAGGUUCAGGCCUUUCAGGCCGCAGGAAGAGGAACGGAGCGAGUCCCCGCGCGCGGCGCGAUUCCCUGAGCUGUGGGACGUGCACCCAGGACUCGGCUCACACAUGC
- example_title: vault RNA 2-1
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: CGGGUCGGAGUUAGCUCAAGCGGUUACCUCCUCAUGCCGGACUUUCUAUCUGUCCAUCUCUGUGCUGGGGUUCGAGACCCGCGGGUGCUUACUGACCCUUUUAUGCAA
- example_title: brain cytoplasmic RNA 1
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: GGCCGGGCGCGGUGGCUCACGCCUGUAAUCCCAGCUCUCAGGGAGGCUAAGAGGCGGGAGGAUAGCUUGAGCCCAGGAGUUCGAGACCUGCCUGGGCAAUAUAGCGAGACCCCGUUCUCCAGAAAAAGGAAAAAAAAAAACAAAAGACAAAAAAAAAAUAAGCGUAACUUCCCUCAAAGCAACAACCCCCCCCCCCCUUU
- example_title: HIV-1 TAR-WT
  pipeline_tag: mean-ribosome-load
  sequence_type: ncRNA
  task: mean-ribosome-load
  text: GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUAGGGAACC
- example_title: prion protein (Kanno blood group)
  pipeline_tag: mean-ribosome-load
  sequence_type: mRNA
  task: mean-ribosome-load
  text: AUGGCGAACCUUGGCUGCUGGAUGCUGGUUCUCUUUGUGGCCACAUGGAGUGACCUGGGCCUCUGC
- example_title: interleukin 10
  pipeline_tag: mean-ribosome-load
  sequence_type: mRNA
  task: mean-ribosome-load
  text: AUGCACAGCUCAGCACUGCUCUGUUGCCUGGUCCUCCUGACUGGGGUGAGGGCC
- example_title: Zaire ebolavirus
  pipeline_tag: mean-ribosome-load
  sequence_type: mRNA
  task: mean-ribosome-load
  text: AAUGUUCAAACACUUUGUGAAGCUCUGUUAGCUGAUGGUCUUGCUAAAGCAUUUCCUAGCAAUAUGAUGGUAGUCACAGAGCGUGAGCAAAAAGAAAGCUUAUUGCAUCAAGCAUCAUGGCACCACACAAGUGAUGAUUUUGGUGAGCAUGCCACAGUUAGAGGGAGUAGCUUUGUAACUGAUUUAGAGAAAUACAAUCUUGCAUUUAGAUAUGAGUUUACAGCACCUUUUAUAGAAUAUUGUAACCGUUGCUAUGGUGUUAAGAAUGUUUUUAAUUGGAUGCAUUAUACAAUCCCACAGUGUUAU
- example_title: SARS coronavirus
  pipeline_tag: mean-ribosome-load
  sequence_type: mRNA
  task: mean-ribosome-load
  text: AUGUUUAUUUUCUUAUUAUUUCUUACUCUCACUAGUGGUAGUGACCUUGACCGGUGCACCACUUUUGAUGAUGUUCAAGCUCCUAAUUACACUCAACAUACUUCAUCUAUGAGGGGGGUUUACUAUCCUGAUGAAAUUUUUAGAUCAGACACUCUUUAUUUAACUCAGGAUUUAUUUCUUCCAUUUUAUUCUAAUGUUACAGGGUUUCAUACUAUUAAUCAUACGUUUGACAACCCUGUCAUACCUUUUAAGGAUGGUAUUUAUUUUGCUGCCACAGAGAAAUCAAAUGUUGUCCGUGGUUGGGUUUUUGGUUCUACCAUGAACAACAAGUCACAGUCGGUGAUUAUUAUUAACAAUUCUACUAAUGUUGUUAUACGAGCAUGUAACUUUGAAUUGUGUGACAACCCUUUCUUUGCUGUUUCUAAACCCAUGGGUACACAGACACAUACUAUGAUAUUCGAUAAUGCAUUUAAAUGCACUUUCGAGUACAUAUCU
- example_title: insulin
  pipeline_tag: mean-ribosome-load
  sequence_type: mRNA
  task: mean-ribosome-load
  text: AUGGCCCUGUGGAUGCGCCUCCUGCCCCUGCUGGCGCUGCUGGCCCUCUGGGGACCUGACCCAGCCGCAGCCUUUGUGAACCAACACCUGUGCGGCUCACACCUGGUGGAAGCUCUCUACCUAGUGUGCGGGGAACGAGGCUUCUUCUACACACCCAAGACCCGCCGGGAGGCAGAGGACCUGCAGGUGGGGCAGGUGGAGCUGGGCGGGGGCCCUGGUGCAGGCAGCCUGCAGCCCUUGGCCCUGGAGGGGUCCCUGCAGAAGCGUGGCAUUGUGGAACAAUGCUGUACCAGCAUCUGCUCCCUCUACCAGCUGGAGAACUACUGCAACUAG
- example_title: cyclin dependent kinase inhibitor 2A
  pipeline_tag: mean-ribosome-load
  sequence_type: mRNA
  task: mean-ribosome-load
  text: AUGGAGCCGGCGGCGGGGAGCAGCAUGGAGCCUUCGGCUGACUGGCUGGCCACGGCCGCGGCCCGGGGUCGGGUAGAGGAGGUGCGGGCGCUGCUGGAGGCGGGGGCGCUGCCCAACGCACCGAAUAGUUACGGUCGGAGGCCGAUCCAGGUCAUGAUGAUGGGCAGCGCCCGAGUGGCGGAGCUGCUGCUGCUCCACGGCGCGGAGCCCAACUGCGCCGACCCCGCCACUCUCACCCGACCCGUGCACGACGCUGCCCGGGAGGGCUUCCUGGACACGCUGGUGGUGCUGCACCGGGCCGGGGCGCGGCUGGACGUGCGCGAUGCCUGGGGCCGUCUGCCCGUGGACCUGGCUGAGGAGCUGGGCCAUCGCGAUGUCGCACGGUACCUGCGCGCGGCUGCGGGGGGCACCAGAGGCAGUAACCAUGCCCGCAUAGAUGCCGCGGAAGGUCCCUCAGACAUCCCCGAUUGA
- example_title: human papillomavirus type 16 E6
  pipeline_tag: mean-ribosome-load
  sequence_type: mRNA
  task: mean-ribosome-load
  text: AUGCACCAAAAGAGAACUGCAAUGUUUCAGGACCCACAGGAGCGACCCAGAAAGUUACCACAGUUAUGCACAGAGCUGCAAACAACUAUACAUGAUAUAAUAUUAGAAUGUGUGUACUGCAAGCAACAGUUACUGCGACGUGAGGUAUAUGACUUUGCUUUUCGGGAUUUAUGCAUAGUAUAUAGAGAUGGGAAUCCAUAUGCUGUAUGUGAUAAAUGUUUAAAGUUUUAUUCUAAAAUUAGUGAGUAUAGACAUUAUUGUUAUAGUUUGUAUGGAACAACAUUAGAACAGCAAUACAACAAACCGUUGUGUGAUUUGUUAAUUAGGUGUAUUAACUGUCAAAAGCCACUGUGUCCUGAAGAAAAGCAAAGACAUCUGGACAAAAAGCAAAGAUUCCAUAAUAUAAGGGGUCGGUGGACCGGUCGAUGUAUGUCUUGUUGCAGAUCAUCAAGAACACGUAGAGAAACCCAGCUGUAA
- example_title: NRAS proto-oncogene
  pipeline_tag: mean-ribosome-load
  sequence_type: 5' UTR
  task: mean-ribosome-load
  text: GGGGCCGGAAGUGCCGCUCCUUGGUGGGGGCUGUUCAUGGCGGUUCCGGGGUCUCCAACAUUUUUCCCGGCUGUGGUCCUAAAUCUGUCCAAAGCAGAGGCAGUGGAGCUUGAGGUUCUUGCUGGUGUGAA
- example_title: amyloid beta precursor protein
  pipeline_tag: mean-ribosome-load
  sequence_type: 5' UTR
  task: mean-ribosome-load
  text: GUCAGUUUCCUCGGCAGCGGUAGGCGAGAGCACGCGGAGGAGCGUGCGCGGGGGCCCCGGGAGACGGCGGCGGUGGCGGCGCGGGCAGAGCAAGGACGCGGCGGAUCCCACUCGCACAGCAGCGCACUCGGUGCCCCGCGCAGGGUCGCG
- example_title: RUNX family transcription factor 1
  pipeline_tag: mean-ribosome-load
  sequence_type: 5' UTR
  task: mean-ribosome-load
  text: ACUUCUUUGGGCCUCAUAAACAACCACAGAACCACAAGUUGGGUAGCCUGGCAGUGUCAGAAGUCUGAACCCAGCAUAGUGGUCAGCAGGCAGGACGAAUCACACUGAAUGCAAACCACAGGGUUUCGCAGCGUGGUAAAAGAAAUCAUUGAGUCCCCCGCCUUCAGAAGAGGGUGCAUUUUCAGGAGGAAGCG
- example_title: fragile X messenger ribonucleoprotein 1
  pipeline_tag: mean-ribosome-load
  sequence_type: 5' UTR
  task: mean-ribosome-load
  text: CUCAGUCAGGCGCUCAGCUCCGUUUCGGUUUCACUUCCGGUGGAGGGCCGCCUCUGAGCGGGCGGCGGGCCGACGGCGAGCGCGGGCGGCGGCGGUGACGGAGGCGCCGCUGCCAGGGGGCGUGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGAGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCUGGGCCUCGAGCGCCCGCAGCCCACCUCUCGGGGGCGGGCUCCCGGCGCUAGCAGGGCUGAAGAGAAG
- example_title: MYC proto-oncogene
  pipeline_tag: mean-ribosome-load
  sequence_type: 5' UTR
  task: mean-ribosome-load
  text: AACUCGCUGUAGUAAUUCCAGCGAGAGGCAGAGGGAGCGAGCGGGCGGCCGGCUAGGGUGGAAGAGCCGGGCGAGCAGAGCUGCGCUGCGGGCGUCCUGGGAAGGGAGAUCCGGAGCGAAUAGGGGGCUUCGCCUCUGGCCCAGCCCUCCCGCUGAUCCCCCAGCCAGCGGUCCGCAACCCUUGCCGCAUCCACGAAACUUUGCCCAUAGCAGCGGGCGGGCACUUUGCACUGGAACUUACAACACCCGAGCAAGGACGCGACUCUCCCGACGCGGGGAGGCUAUUCUGCCCAUUUGGGGACACUUCCCCGCCGCUGCCAGGACCCGCUUCUCUGAAAGGCUCUCCUUGCAGCUGCUUAGACG
- example_title: activating transcription factor 4
  pipeline_tag: mean-ribosome-load
  sequence_type: 5' UTR
  task: mean-ribosome-load
  text: CAUUUCUACUUUGCCCGCCCACAGAUGUAGUUUUCUCUGCGCGUGUGCGUUUUCCCUCCUCCCCGCCCUCAGGGUCCACGGCCACCAUGGCGUAUUAGGGGCAGCAGUGCCUGCGGCAGCAUUGGCCUUUGCAGCGGCGGCAGCAGCACCAGGCUCUGCAGCGGCAACCCCCAGCGGCUUAAGCCAUGGCGCUUCUCACGGCAUUCAGCAGCAGCGUUGCUGUAACCGACAAAGACACCUUCGAAUUAAGCACAUUCCUCGAUUCCAGCAAAGCACCGCAAC
- example_title: Human GPI protein p137
  pipeline_tag: mean-ribosome-load
  sequence_type: 3' UTR
  task: mean-ribosome-load
  text: UUUUUAAAAGGAAAAGAUACCAAAUGCCUGCUGCUACCACCCUUUUCAAUUGCUAUGUUUUGAAAGGCACCAGUAUGUGUUUUAGAUUGAUUUAAAUGUUUCAUUUAAAUCACGGACAGUAGUUUCAGUUCUGAUGGUAUAAGCAAAACAAAUAAAACGUUUAUAAAAGUUGUAUCUUGAAACACUGGUGUUCAACAGCUAGCAGCUUAUGUGAUUCACCCCAUGCCACGUUAGUGUCACAAAUUUUAUGGUUUAUCUCCAGCAACAUUUCUCUAGUACUUGCACUUAUUAUCUGAAUUC
- example_title: nucleophosmin 1
  pipeline_tag: mean-ribosome-load
  sequence_type: 3' UTR
  task: mean-ribosome-load
  text: GAAAAUAGUUUAAACAAUUUGUUAAAAAAUUUUCCGUCUUAUUUCAUUUCUGUAACAGUUGAUAUCUGGCUGUCCUUUUUAUAAUGCAGAGUGAGAACUUUCCCUACCGUGUUUGAUAAAUGUUGUCCAGGUUCUAUUGCCAAGAAUGUGUUGUCCAAAAUGCCUGUUUAGUUUUUAAAGAUGGAACUCCACCCUUUGCUUGGUUUUAAGUAUGUAUGGAAUGUUAUGAUAGGACAUAGUAGUAGCGGUGGUCAGACAUGGAAAUGGUGGGGAGACAAAAAUAUACAUGUGAAAUAAAACUCAGUAUUUUAAUAAAGUAGCACGGUUUCUAUUGA
- example_title: superoxide dismutase 1
  pipeline_tag: mean-ribosome-load
  sequence_type: 3' UTR
  task: mean-ribosome-load
  text: ACAUUCCCUUGGAUGUAGUCUGAGGCCCCUUAACUCAUCUGUUAUCCUGCUAGCUGUAGAAAUGUAUCCUGAUAAACAUUAAACACUGUAAUCUUAAAAGUGUAAUUGUGUGACUUUUUCAGAGUUGCUUUAAAGUACCUGUAGUGAGAAACUGAUUUAUGAUCACUUGGAAGAUUUGUAUAGUUUUAUAAAACUCAGUUAAAAUGUCUGUUUCAAUGACCUGUAUUUUGCCAGACUUAAAUCACAGAUGGGUAUUAAACUUGUCAGAAUUUCUUUGUCAUUCAAGCCUGUGAAUAAAAACCCUGUAUGGCACUUAUUAUGAGGCUAUUAAAAGAAUCCAAAUUCAAACUAAA
- example_title: hemoglobin subunit alpha 2
  pipeline_tag: mean-ribosome-load
  sequence_type: 3' UTR
  task: mean-ribosome-load
  text: CUGGAGCCUCGGUAGCCGUUCCUCCUGCCCGCUGGGCCUCCCAACGGGCCCUCCUCCCCUCCUUGCACCGGCCCUUCCUGGUCUUUGAAUAAAGUCUGAGUGGGCAGCA
- example_title: BRAF proto-oncogene
  pipeline_tag: mean-ribosome-load
  sequence_type: 3' UTR
  task: mean-ribosome-load
  text: AACAAAUGAGUGAGAGAGUUCAGGAGAGUAGCAACAAAAGGAAAAUAAAUGAACAUAUGUUUGCUUAUAUGUUAAAUUGAAUAAAAUACUCUCUUUUUUUUUAAGGUGAACCAAAGAACACUUGUGUGGUUAAAGACUAGAUAUAAUUUUUCCCCAAACUAAAAUUUAUACUUAACAUUGGAUUUUUAACAUCCAAGGGUUAAAAUACAUAGACAUUGCUAAAAAUUGGCAGAGCCUCUUCUAGAGGCUUUACUUUCUGUUCCGGGUUUGUAUCAUUCACUUGGUUAUUUUAAGUAGUAAACUUCAGUUUCUCAUGCAACUUUUGUUGCCAGCUAUCACAUGUCCACUAGGGACUCCAGAAGAAGACCCUACCUAUGCCUGUGUUUGCAGGUGAGAAGUUGGCAGUCGGUUAGCCUGGG
- example_title: H3 clustered histone 1
  pipeline_tag: mean-ribosome-load
  sequence_type: 3' UTR
  task: mean-ribosome-load
  text: UUACUGUGGUCUCUCUGACGGUCCAAGCAAAGGCUCUUUUCAGAGCCACCACCUUUUC
---

# Optimus 5-Prime

Convolutional neural network that predicts the mean ribosome load (MRL) of a fixed 50 nt human 5' untranslated region (5'UTR) from sequence alone.

## Disclaimer

This is an UNOFFICIAL implementation of [Human 5' UTR design and variant effect prediction from a massively parallel translation assay](https://doi.org/10.1038/s41587-019-0164-5) by Paul J. Sample, et al.

The OFFICIAL repository of Optimus 5-Prime is at [pjsample/human_5utr_modeling](https://github.com/pjsample/human_5utr_modeling).

> [!TIP]
> The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

**The team releasing Optimus 5-Prime did not write this model card for this model so this model card has been written by the MultiMolecule team.**

## Model Details

Optimus 5-Prime is a simple, fully feed-forward 1D convolutional network trained on a massively parallel polysome-profiling assay of ~280,000 random 50 nt 5'UTRs upstream of an eGFP reporter expressed in HEK293T. The network ingests a fixed 50 nt 5'UTR one-hot tensor, applies three stacked `padding="same"` 1D convolutions (120 filters, kernel 8, ReLU) with dropout between the second/third convolutions, flattens the per-position activations channels-last, and emits a single standardized mean ribosome load (MRL) regression score through a 40-unit fully connected layer and a linear regression head. Please refer to the [Training Details](#training-details) section for more information on the training process.

The MRL scalar is the per-sequence mean of polysome-profile-derived ribosome loading and is used by the original authors both to score natural human 5'UTRs and to engineer new sequences with predictable translation efficiency. Variant-effect scoring is performed externally by computing the MRL difference between the reference and alternative sequences; the model itself takes a single sequence as input.

### Model Specification

| Num Layers | Hidden Size | Num Parameters (M) | FLOPs (M) | MACs (M) | Max Num Tokens |
| ---------- | ----------- | ------------------ | --------- | -------- | -------------- |
| 4          | 40          | 0.48               | 24.04     | 12.00    | 50             |

### Links

- **Code**: [multimolecule.optimus5prime](https://github.com/DLS5-Omics/multimolecule/tree/master/multimolecule/models/optimus5prime)
- **Data**: Massively parallel polysome-profiling MRL library on randomized 50 nt 5'UTRs in HEK293T, GEO [GSE114002](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114002)
- **Paper**: [Human 5' UTR design and variant effect prediction from a massively parallel translation assay](https://doi.org/10.1038/s41587-019-0164-5)
- **Developed by**: Paul J. Sample, Ban Wang, David W. Reid, Vlad Presnyak, Iain J. McFadyen, David R. Morris, Georg Seelig
- **Model type**: 1D CNN for mean ribosome load (MRL) regression from a fixed 50 nt 5'UTR sequence
- **Original Repository**: [pjsample/human_5utr_modeling](https://github.com/pjsample/human_5utr_modeling)

## Usage

The model file depends on the [`multimolecule`](https://multimolecule.danling.org) library. You can install it using pip:

```bash
pip install multimolecule
```

### Direct Use

#### Mean Ribosome Load Prediction

You can use this model directly to predict the mean ribosome load (MRL) of a fixed 50 nt 5'UTR sequence:

```python
>>> from multimolecule import RnaTokenizer, Optimus5PrimeForSequencePrediction

>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeForSequencePrediction.from_pretrained("multimolecule/optimus5prime")
>>> output = model(**tokenizer("GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC", return_tensors="pt"))

>>> output.keys()
odict_keys(['logits'])
```

The pre-regression dense representation is exposed on the backbone:

```python
>>> from multimolecule import RnaTokenizer, Optimus5PrimeModel

>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeModel.from_pretrained("multimolecule/optimus5prime")
>>> output = model(**tokenizer("GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC", return_tensors="pt"))

>>> output.keys()
odict_keys(['pooler_output'])
```

### Interface

- **Input length**: fixed 50 nt 5'UTR sequence
- **Padding**: shorter sequences are right-padded with zeros to 50 nt; longer sequences are truncated to the first 50 nt
- **Alphabet**: RNA (`A`, `C`, `G`, `U`); `N` is encoded as an all-zero channel
- **Special tokens**: none added; `input_ids` are consumed positionally as one-hot channels
- **Output**: standardized mean ribosome load score (`logits`) of shape `(batch_size, 1)`; raw-MRL calibration requires the external scaler used by the upstream training workflow

### Variant Effect

Optimus 5-Prime is a single-sequence regression model. To score the effect of a variant on translation, run the reference and alternative 5'UTRs through the model independently and compute the difference between their predicted MRL values:

```python
>>> from multimolecule import RnaTokenizer, Optimus5PrimeForSequencePrediction
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeForSequencePrediction.from_pretrained("multimolecule/optimus5prime")
>>> ref = "GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC"
>>> alt = "GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGAUAGC"
>>> ref_mrl = model(**tokenizer(ref, return_tensors="pt"))["logits"]
>>> alt_mrl = model(**tokenizer(alt, return_tensors="pt"))["logits"]
>>> delta = (alt_mrl - ref_mrl).item()
```

## Training Details

Optimus 5-Prime was trained to regress the per-sequence mean ribosome load (MRL) derived from polysome profiling on a massively parallel reporter assay.

### Training Data

Optimus 5-Prime was trained on approximately 280,000 randomized 50 nt 5'UTRs placed upstream of an eGFP reporter and expressed in HEK293T cells. Mean ribosome load was computed per sequence from polysome-fractionation read counts. The raw sequencing data are available at GEO accession [GSE114002](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114002).

### Training Procedure

#### Pre-training

The published `main_MRL_model` was trained with mean-squared-error loss against standardized per-sequence MRL values. The optimizer was Adam with learning rate 1e-3, batch size 128, betas (0.9, 0.999), and epsilon 1e-8.

## Citation

```bibtex
@article{sample2019human,
  author    = {Sample, Paul J. and Wang, Ban and Reid, David W. and Presnyak, Vlad and McFadyen, Iain J. and Morris, David R. and Seelig, Georg},
  title     = {Human 5' UTR design and variant effect prediction from a massively parallel translation assay},
  journal   = {Nature Biotechnology},
  volume    = {37},
  number    = {7},
  pages     = {803--809},
  year      = {2019},
  publisher = {Springer Science and Business Media LLC},
  doi       = {10.1038/s41587-019-0164-5}
}
```

> [!NOTE]
> The artifacts distributed in this repository are part of the MultiMolecule project.
> If MultiMolecule supports your research, please cite the MultiMolecule project as follows:

```bibtex
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}
```

## Contact

Please use GitHub issues of [MultiMolecule](https://github.com/DLS5-Omics/multimolecule/issues) for any questions or comments on the model card.

Please contact the authors of the [Optimus 5-Prime paper](https://doi.org/10.1038/s41587-019-0164-5) for questions or comments on the paper/model.

## License

This model implementation is licensed under the [GNU Affero General Public License](license.md).

For additional terms and clarifications, please refer to our [License FAQ](license-faq.md).

```spdx
SPDX-License-Identifier: AGPL-3.0-or-later
```