Spaces:
Running
Running
fix: fix typo in description
Browse files- src/streamlit_app.py +2 -3
src/streamlit_app.py
CHANGED
|
@@ -138,17 +138,16 @@ and functional-regulatory prediction, which includes diverse experimental tracks
|
|
| 138 |
and translation (Ribo-seq).
|
| 139 |
|
| 140 |
Data are drawn from a phylogenetically diverse set of species, including organisms seen during post-training
|
| 141 |
-
(human, chicken,
|
| 142 |
to avoid data leakage. This design allows the dataset to probe long-range sequence-to-function mapping,
|
| 143 |
cross-species generalization, and transfer across heterogeneous regulatory modalities,
|
| 144 |
including assays not present in prior multispecies training corpora. By standardizing sequence length,
|
| 145 |
-
resolution, and evaluation metrics across all tracks,
|
| 146 |
for comparing representation quality across genomic foundation models.
|
| 147 |
|
| 148 |
The metrics used are:
|
| 149 |
- **Pearson correlations (multi-assay)**: per-dataset scores across species and models for functional tracks.
|
| 150 |
- **MCC (bed tracks)**: per-track MCC values across species and models for gene annotation tracks.
|
| 151 |
-
|
| 152 |
"""
|
| 153 |
|
| 154 |
HERE = os.path.dirname(os.path.abspath(__file__)) # /app/src
|
|
|
|
| 138 |
and translation (Ribo-seq).
|
| 139 |
|
| 140 |
Data are drawn from a phylogenetically diverse set of species, including organisms seen during post-training
|
| 141 |
+
(human, chicken, arabidopsis, rice, maize) and entirely unseen species (cattle, tomato), with careful curation
|
| 142 |
to avoid data leakage. This design allows the dataset to probe long-range sequence-to-function mapping,
|
| 143 |
cross-species generalization, and transfer across heterogeneous regulatory modalities,
|
| 144 |
including assays not present in prior multispecies training corpora. By standardizing sequence length,
|
| 145 |
+
resolution, and evaluation metrics across all tracks, the NTv3 Benchmark provides a controlled dataset
|
| 146 |
for comparing representation quality across genomic foundation models.
|
| 147 |
|
| 148 |
The metrics used are:
|
| 149 |
- **Pearson correlations (multi-assay)**: per-dataset scores across species and models for functional tracks.
|
| 150 |
- **MCC (bed tracks)**: per-track MCC values across species and models for gene annotation tracks.
|
|
|
|
| 151 |
"""
|
| 152 |
|
| 153 |
HERE = os.path.dirname(os.path.abspath(__file__)) # /app/src
|