Spaces:

InstaDeepAI
/

ntv3_benchmark

Running

MidAtBest commited on Dec 17, 2025

Commit

2cd0864

1 Parent(s): 4c80381

fix: fix typo in description

Files changed (1) hide show

src/streamlit_app.py CHANGED Viewed

@@ -138,17 +138,16 @@ and functional-regulatory prediction, which includes diverse experimental tracks
  and translation (Ribo-seq).
 Data are drawn from a phylogenetically diverse set of species, including organisms seen during post-training
-(human, chicken, Arabidopsis, rice, maize) and entirely unseen species (cattle, tomato), with careful curation
 to avoid data leakage. This design allows the dataset to probe long-range sequence-to-function mapping,
 cross-species generalization, and transfer across heterogeneous regulatory modalities,
 including assays not present in prior multispecies training corpora. By standardizing sequence length,
-resolution, and evaluation metrics across all tracks, \brandbenchmark provides a controlled dataset
 for comparing representation quality across genomic foundation models.
 The metrics used are:
 - **Pearson correlations (multi-assay)**: per-dataset scores across species and models for functional tracks.
 - **MCC (bed tracks)**: per-track MCC values across species and models for gene annotation tracks.
 """
 HERE = os.path.dirname(os.path.abspath(__file__))  # /app/src

  and translation (Ribo-seq).
 Data are drawn from a phylogenetically diverse set of species, including organisms seen during post-training
+(human, chicken, arabidopsis, rice, maize) and entirely unseen species (cattle, tomato), with careful curation
 to avoid data leakage. This design allows the dataset to probe long-range sequence-to-function mapping,
 cross-species generalization, and transfer across heterogeneous regulatory modalities,
 including assays not present in prior multispecies training corpora. By standardizing sequence length,
+resolution, and evaluation metrics across all tracks, the NTv3 Benchmark provides a controlled dataset
 for comparing representation quality across genomic foundation models.
 The metrics used are:
 - **Pearson correlations (multi-assay)**: per-dataset scores across species and models for functional tracks.
 - **MCC (bed tracks)**: per-track MCC values across species and models for gene annotation tracks.
 """
 HERE = os.path.dirname(os.path.abspath(__file__))  # /app/src