ybornachot commited on
Commit
b0d8db2
·
1 Parent(s): a30b8d8

fix: cleaned cells

Browse files
Files changed (1) hide show
  1. notebooks/03_fine_tuning.ipynb +9 -406
notebooks/03_fine_tuning.ipynb CHANGED
@@ -8,9 +8,11 @@
8
  "\n",
9
  "This notebook demonstrates a **simplified fine-tuning setup** that enables training of a pre-trained Nucleotide Transformer v3 (NTv3) model to predict BigWig signal tracks directly from DNA sequences. The streamlined approach leverages a pre-trained NTv3 backbone as a feature extractor and adds a custom prediction head that outputs single-nucleotide resolution signal values for various genomic tracks (e.g., ChIP-seq, ATAC-seq, RNA-seq).\n",
10
  "\n",
11
- "**⚡ Key Advantage**: This simplified pipeline achieves **close performance to more complex training approaches** while enabling **relatively fast fine-tuning in approximately one hour**. The setup is designed for rapid experimentation and iteration, making it ideal for adapting pre-trained models to your specific genomic tracks or experimental conditions without the overhead of complex distributed training infrastructure.\n",
12
  "\n",
13
- "**🔧 Main Simplifications**: Compared to the full supervised tracks pipeline, this notebook simplifies several aspects to enable faster iteration:\n",
 
 
14
  "\n",
15
  "- **Data splits**: Uses simple chromosome-based train/val/test splits (e.g., assigning entire chromosomes to each split) instead of more complex region-based splits\n",
16
  "- **Random sequence sampling**: The dataset randomly samples sequences from chromosomes/regions on-the-fly, rather than using pre-computed sliding windows\n",
@@ -42,371 +44,9 @@
42
  },
43
  {
44
  "cell_type": "code",
45
- "execution_count": 1,
46
  "metadata": {},
47
- "outputs": [
48
- {
49
- "name": "stdout",
50
- "output_type": "stream",
51
- "text": [
52
- "Collecting datasets\n",
53
- " Downloading datasets-4.4.1-py3-none-any.whl.metadata (19 kB)\n",
54
- "Requirement already satisfied: transformers in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (4.57.3)\n",
55
- "Requirement already satisfied: torchmetrics in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (1.8.2)\n",
56
- "Requirement already satisfied: plotly in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (6.5.0)\n",
57
- "Requirement already satisfied: filelock in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from datasets) (3.20.0)\n",
58
- "Requirement already satisfied: numpy>=1.17 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from datasets) (2.3.5)\n",
59
- "Collecting pyarrow>=21.0.0 (from datasets)\n",
60
- " Downloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.2 kB)\n",
61
- "Collecting dill<0.4.1,>=0.3.0 (from datasets)\n",
62
- " Downloading dill-0.4.0-py3-none-any.whl.metadata (10 kB)\n",
63
- "Collecting pandas (from datasets)\n",
64
- " Downloading pandas-2.3.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (91 kB)\n",
65
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m91.2/91.2 kB\u001b[0m \u001b[31m3.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
66
- "\u001b[?25hRequirement already satisfied: requests>=2.32.2 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from datasets) (2.32.5)\n",
67
- "Collecting httpx<1.0.0 (from datasets)\n",
68
- " Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)\n",
69
- "Requirement already satisfied: tqdm>=4.66.3 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from datasets) (4.67.1)\n",
70
- "Collecting xxhash (from datasets)\n",
71
- " Downloading xxhash-3.6.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (13 kB)\n",
72
- "Collecting multiprocess<0.70.19 (from datasets)\n",
73
- " Downloading multiprocess-0.70.18-py312-none-any.whl.metadata (7.5 kB)\n",
74
- "Collecting fsspec<=2025.10.0,>=2023.1.0 (from fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
75
- " Downloading fsspec-2025.10.0-py3-none-any.whl.metadata (10 kB)\n",
76
- "Requirement already satisfied: huggingface-hub<2.0,>=0.25.0 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from datasets) (0.36.0)\n",
77
- "Requirement already satisfied: packaging in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from datasets) (25.0)\n",
78
- "Requirement already satisfied: pyyaml>=5.1 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from datasets) (6.0.3)\n",
79
- "Requirement already satisfied: regex!=2019.12.17 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from transformers) (2025.11.3)\n",
80
- "Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from transformers) (0.22.1)\n",
81
- "Requirement already satisfied: safetensors>=0.4.3 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from transformers) (0.7.0)\n",
82
- "Requirement already satisfied: torch>=2.0.0 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torchmetrics) (2.9.1)\n",
83
- "Requirement already satisfied: lightning-utilities>=0.8.0 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torchmetrics) (0.15.2)\n",
84
- "Requirement already satisfied: narwhals>=1.15.1 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from plotly) (2.13.0)\n",
85
- "Collecting aiohttp!=4.0.0a0,!=4.0.0a1 (from fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
86
- " Downloading aiohttp-3.13.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (8.1 kB)\n",
87
- "Collecting anyio (from httpx<1.0.0->datasets)\n",
88
- " Downloading anyio-4.12.0-py3-none-any.whl.metadata (4.3 kB)\n",
89
- "Requirement already satisfied: certifi in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from httpx<1.0.0->datasets) (2025.11.12)\n",
90
- "Collecting httpcore==1.* (from httpx<1.0.0->datasets)\n",
91
- " Using cached httpcore-1.0.9-py3-none-any.whl.metadata (21 kB)\n",
92
- "Requirement already satisfied: idna in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from httpx<1.0.0->datasets) (3.11)\n",
93
- "Collecting h11>=0.16 (from httpcore==1.*->httpx<1.0.0->datasets)\n",
94
- " Using cached h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)\n",
95
- "Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from huggingface-hub<2.0,>=0.25.0->datasets) (4.15.0)\n",
96
- "Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from huggingface-hub<2.0,>=0.25.0->datasets) (1.2.0)\n",
97
- "Requirement already satisfied: setuptools in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from lightning-utilities>=0.8.0->torchmetrics) (80.9.0)\n",
98
- "Requirement already satisfied: charset_normalizer<4,>=2 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from requests>=2.32.2->datasets) (3.4.4)\n",
99
- "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from requests>=2.32.2->datasets) (2.6.1)\n",
100
- "Requirement already satisfied: sympy>=1.13.3 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (1.14.0)\n",
101
- "Requirement already satisfied: networkx>=2.5.1 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (3.6.1)\n",
102
- "Requirement already satisfied: jinja2 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (3.1.6)\n",
103
- "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (12.8.93)\n",
104
- "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (12.8.90)\n",
105
- "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (12.8.90)\n",
106
- "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (9.10.2.21)\n",
107
- "Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (12.8.4.1)\n",
108
- "Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (11.3.3.83)\n",
109
- "Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (10.3.9.90)\n",
110
- "Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (11.7.3.90)\n",
111
- "Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (12.5.8.93)\n",
112
- "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (0.7.1)\n",
113
- "Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (2.27.5)\n",
114
- "Requirement already satisfied: nvidia-nvshmem-cu12==3.3.20 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (3.3.20)\n",
115
- "Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (12.8.90)\n",
116
- "Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (12.8.93)\n",
117
- "Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (1.13.1.3)\n",
118
- "Requirement already satisfied: triton==3.5.1 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from torch>=2.0.0->torchmetrics) (3.5.1)\n",
119
- "Requirement already satisfied: python-dateutil>=2.8.2 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from pandas->datasets) (2.9.0.post0)\n",
120
- "Collecting pytz>=2020.1 (from pandas->datasets)\n",
121
- " Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)\n",
122
- "Collecting tzdata>=2022.7 (from pandas->datasets)\n",
123
- " Downloading tzdata-2025.3-py2.py3-none-any.whl.metadata (1.4 kB)\n",
124
- "Collecting aiohappyeyeballs>=2.5.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
125
- " Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)\n",
126
- "Collecting aiosignal>=1.4.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
127
- " Downloading aiosignal-1.4.0-py3-none-any.whl.metadata (3.7 kB)\n",
128
- "Collecting attrs>=17.3.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
129
- " Downloading attrs-25.4.0-py3-none-any.whl.metadata (10 kB)\n",
130
- "Collecting frozenlist>=1.1.1 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
131
- " Downloading frozenlist-1.8.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata (20 kB)\n",
132
- "Collecting multidict<7.0,>=4.5 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
133
- " Downloading multidict-6.7.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (5.3 kB)\n",
134
- "Collecting propcache>=0.2.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
135
- " Downloading propcache-0.4.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (13 kB)\n",
136
- "Collecting yarl<2.0,>=1.17.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.10.0,>=2023.1.0->datasets)\n",
137
- " Downloading yarl-1.22.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (75 kB)\n",
138
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.1/75.1 kB\u001b[0m \u001b[31m23.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
139
- "\u001b[?25hRequirement already satisfied: six>=1.5 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)\n",
140
- "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from sympy>=1.13.3->torch>=2.0.0->torchmetrics) (1.3.0)\n",
141
- "Requirement already satisfied: MarkupSafe>=2.0 in /home/y-bornachot/venvs/ntv3-env/lib/python3.12/site-packages (from jinja2->torch>=2.0.0->torchmetrics) (3.0.3)\n",
142
- "Downloading datasets-4.4.1-py3-none-any.whl (511 kB)\n",
143
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m511.6/511.6 kB\u001b[0m \u001b[31m32.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
144
- "\u001b[?25hDownloading dill-0.4.0-py3-none-any.whl (119 kB)\n",
145
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.7/119.7 kB\u001b[0m \u001b[31m19.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
146
- "\u001b[?25hDownloading fsspec-2025.10.0-py3-none-any.whl (200 kB)\n",
147
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m201.0/201.0 kB\u001b[0m \u001b[31m16.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
148
- "\u001b[?25hUsing cached httpx-0.28.1-py3-none-any.whl (73 kB)\n",
149
- "Using cached httpcore-1.0.9-py3-none-any.whl (78 kB)\n",
150
- "Downloading multiprocess-0.70.18-py312-none-any.whl (150 kB)\n",
151
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m150.3/150.3 kB\u001b[0m \u001b[31m18.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
152
- "\u001b[?25hDownloading pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl (47.7 MB)\n",
153
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m47.7/47.7 MB\u001b[0m \u001b[31m33.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:01\u001b[0m\n",
154
- "\u001b[?25hDownloading pandas-2.3.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.4 MB)\n",
155
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.4/12.4 MB\u001b[0m \u001b[31m43.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m0:01\u001b[0m01\u001b[0m\n",
156
- "\u001b[?25hDownloading xxhash-3.6.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (193 kB)\n",
157
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m193.9/193.9 kB\u001b[0m \u001b[31m19.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
158
- "\u001b[?25hDownloading aiohttp-3.13.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.8 MB)\n",
159
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m38.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m31m44.4 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\n",
160
- "\u001b[?25hDownloading pytz-2025.2-py2.py3-none-any.whl (509 kB)\n",
161
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m509.2/509.2 kB\u001b[0m \u001b[31m31.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
162
- "\u001b[?25hDownloading tzdata-2025.3-py2.py3-none-any.whl (348 kB)\n",
163
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m348.5/348.5 kB\u001b[0m \u001b[31m40.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
164
- "\u001b[?25hDownloading anyio-4.12.0-py3-none-any.whl (113 kB)\n",
165
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m113.4/113.4 kB\u001b[0m \u001b[31m29.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
166
- "\u001b[?25hDownloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)\n",
167
- "Downloading aiosignal-1.4.0-py3-none-any.whl (7.5 kB)\n",
168
- "Downloading attrs-25.4.0-py3-none-any.whl (67 kB)\n",
169
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.6/67.6 kB\u001b[0m \u001b[31m18.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
170
- "\u001b[?25hDownloading frozenlist-1.8.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (242 kB)\n",
171
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m242.4/242.4 kB\u001b[0m \u001b[31m25.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
172
- "\u001b[?25hUsing cached h11-0.16.0-py3-none-any.whl (37 kB)\n",
173
- "Downloading multidict-6.7.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (256 kB)\n",
174
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m256.1/256.1 kB\u001b[0m \u001b[31m23.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
175
- "\u001b[?25hDownloading propcache-0.4.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (221 kB)\n",
176
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m221.6/221.6 kB\u001b[0m \u001b[31m28.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
177
- "\u001b[?25hDownloading yarl-1.22.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (377 kB)\n",
178
- "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m377.3/377.3 kB\u001b[0m \u001b[31m31.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
179
- "\u001b[?25hInstalling collected packages: pytz, xxhash, tzdata, pyarrow, propcache, multidict, h11, fsspec, frozenlist, dill, attrs, anyio, aiohappyeyeballs, yarl, pandas, multiprocess, httpcore, aiosignal, httpx, aiohttp, datasets\n",
180
- " Attempting uninstall: fsspec\n",
181
- " Found existing installation: fsspec 2025.12.0\n",
182
- " Uninstalling fsspec-2025.12.0:\n",
183
- " Successfully uninstalled fsspec-2025.12.0\n",
184
- "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
185
- "genomix-research 0.1.0 requires absl-py==2.1.0, which is not installed.\n",
186
- "genomix-research 0.1.0 requires aiobotocore==2.21.1, which is not installed.\n",
187
- "genomix-research 0.1.0 requires aioitertools==0.12.0, which is not installed.\n",
188
- "genomix-research 0.1.0 requires antlr4-python3-runtime==4.9.3, which is not installed.\n",
189
- "genomix-research 0.1.0 requires argon2-cffi==23.1.0, which is not installed.\n",
190
- "genomix-research 0.1.0 requires argon2-cffi-bindings==21.2.0, which is not installed.\n",
191
- "genomix-research 0.1.0 requires array-record==0.8.1, which is not installed.\n",
192
- "genomix-research 0.1.0 requires arrow==1.3.0, which is not installed.\n",
193
- "genomix-research 0.1.0 requires astunparse==1.6.3, which is not installed.\n",
194
- "genomix-research 0.1.0 requires async-lru==2.0.5, which is not installed.\n",
195
- "genomix-research 0.1.0 requires babel==2.17.0, which is not installed.\n",
196
- "genomix-research 0.1.0 requires beautifulsoup4==4.13.3, which is not installed.\n",
197
- "genomix-research 0.1.0 requires biopython==1.85, which is not installed.\n",
198
- "genomix-research 0.1.0 requires bleach==6.2.0, which is not installed.\n",
199
- "genomix-research 0.1.0 requires boto3==1.37.1, which is not installed.\n",
200
- "genomix-research 0.1.0 requires botocore==1.37.1, which is not installed.\n",
201
- "genomix-research 0.1.0 requires bravado==11.1.0, which is not installed.\n",
202
- "genomix-research 0.1.0 requires bravado-core==5.16.1, which is not installed.\n",
203
- "genomix-research 0.1.0 requires bx-python==0.13.0, which is not installed.\n",
204
- "genomix-research 0.1.0 requires cachetools==5.5.2, which is not installed.\n",
205
- "genomix-research 0.1.0 requires cffi==1.17.1, which is not installed.\n",
206
- "genomix-research 0.1.0 requires cfgv==3.4.0, which is not installed.\n",
207
- "genomix-research 0.1.0 requires chex==0.1.88, which is not installed.\n",
208
- "genomix-research 0.1.0 requires click==8.1.8, which is not installed.\n",
209
- "genomix-research 0.1.0 requires cloudpickle==3.1.1, which is not installed.\n",
210
- "genomix-research 0.1.0 requires defusedxml==0.7.1, which is not installed.\n",
211
- "genomix-research 0.1.0 requires distlib==0.3.9, which is not installed.\n",
212
- "genomix-research 0.1.0 requires distrax>=0.1.5, which is not installed.\n",
213
- "genomix-research 0.1.0 requires dm-tree==0.1.9, which is not installed.\n",
214
- "genomix-research 0.1.0 requires etils==1.12.1, which is not installed.\n",
215
- "genomix-research 0.1.0 requires fastjsonschema==2.21.1, which is not installed.\n",
216
- "genomix-research 0.1.0 requires flatbuffers==25.2.10, which is not installed.\n",
217
- "genomix-research 0.1.0 requires flax==0.10.4, which is not installed.\n",
218
- "genomix-research 0.1.0 requires fqdn==1.5.1, which is not installed.\n",
219
- "genomix-research 0.1.0 requires future==1.0.0, which is not installed.\n",
220
- "genomix-research 0.1.0 requires gast==0.6.0, which is not installed.\n",
221
- "genomix-research 0.1.0 requires gcsfs==2025.3.0, which is not installed.\n",
222
- "genomix-research 0.1.0 requires gitdb==4.0.12, which is not installed.\n",
223
- "genomix-research 0.1.0 requires gitpython==3.1.44, which is not installed.\n",
224
- "genomix-research 0.1.0 requires google-api-core==2.24.1, which is not installed.\n",
225
- "genomix-research 0.1.0 requires google-api-python-client==2.165.0, which is not installed.\n",
226
- "genomix-research 0.1.0 requires google-auth==2.38.0, which is not installed.\n",
227
- "genomix-research 0.1.0 requires google-auth-httplib2==0.2.0, which is not installed.\n",
228
- "genomix-research 0.1.0 requires google-auth-oauthlib==1.2.1, which is not installed.\n",
229
- "genomix-research 0.1.0 requires google-cloud-core==2.4.2, which is not installed.\n",
230
- "genomix-research 0.1.0 requires google-cloud-storage==3.1.0, which is not installed.\n",
231
- "genomix-research 0.1.0 requires google-crc32c==1.6.0, which is not installed.\n",
232
- "genomix-research 0.1.0 requires google-pasta==0.2.0, which is not installed.\n",
233
- "genomix-research 0.1.0 requires google-resumable-media==2.7.2, which is not installed.\n",
234
- "genomix-research 0.1.0 requires googleapis-common-protos==1.69.1, which is not installed.\n",
235
- "genomix-research 0.1.0 requires grain==0.2.11, which is not installed.\n",
236
- "genomix-research 0.1.0 requires grigri==0.0.2, which is not installed.\n",
237
- "genomix-research 0.1.0 requires grpcio==1.71.0, which is not installed.\n",
238
- "genomix-research 0.1.0 requires h5py==3.13.0, which is not installed.\n",
239
- "genomix-research 0.1.0 requires httplib2==0.22.0, which is not installed.\n",
240
- "genomix-research 0.1.0 requires humanize==4.12.1, which is not installed.\n",
241
- "genomix-research 0.1.0 requires hydra-core==1.3.2, which is not installed.\n",
242
- "genomix-research 0.1.0 requires identify==2.6.9, which is not installed.\n",
243
- "genomix-research 0.1.0 requires importlib-resources==6.5.2, which is not installed.\n",
244
- "genomix-research 0.1.0 requires iniconfig==2.0.0, which is not installed.\n",
245
- "genomix-research 0.1.0 requires isoduration==20.11.0, which is not installed.\n",
246
- "genomix-research 0.1.0 requires jax==0.5.3, which is not installed.\n",
247
- "genomix-research 0.1.0 requires jaxlib==0.5.3, which is not installed.\n",
248
- "genomix-research 0.1.0 requires jaxtyping==0.2.38, which is not installed.\n",
249
- "genomix-research 0.1.0 requires jmespath==1.0.1, which is not installed.\n",
250
- "genomix-research 0.1.0 requires json5==0.10.0, which is not installed.\n",
251
- "genomix-research 0.1.0 requires jsonpointer==3.0.0, which is not installed.\n",
252
- "genomix-research 0.1.0 requires jsonref==1.1.0, which is not installed.\n",
253
- "genomix-research 0.1.0 requires jsonschema==4.23.0, which is not installed.\n",
254
- "genomix-research 0.1.0 requires jsonschema-specifications==2024.10.1, which is not installed.\n",
255
- "genomix-research 0.1.0 requires jupyter==1.1.1, which is not installed.\n",
256
- "genomix-research 0.1.0 requires jupyter-console==6.6.3, which is not installed.\n",
257
- "genomix-research 0.1.0 requires jupyter-events==0.12.0, which is not installed.\n",
258
- "genomix-research 0.1.0 requires jupyter-lsp==2.2.5, which is not installed.\n",
259
- "genomix-research 0.1.0 requires jupyter-server==2.15.0, which is not installed.\n",
260
- "genomix-research 0.1.0 requires jupyter-server-terminals==0.5.3, which is not installed.\n",
261
- "genomix-research 0.1.0 requires jupyterlab==4.3.6, which is not installed.\n",
262
- "genomix-research 0.1.0 requires jupyterlab-pygments==0.3.0, which is not installed.\n",
263
- "genomix-research 0.1.0 requires jupyterlab-server==2.27.3, which is not installed.\n",
264
- "genomix-research 0.1.0 requires keras>=3.11.3, which is not installed.\n",
265
- "genomix-research 0.1.0 requires libclang==18.1.1, which is not installed.\n",
266
- "genomix-research 0.1.0 requires markdown==3.7, which is not installed.\n",
267
- "genomix-research 0.1.0 requires markdown-it-py==3.0.0, which is not installed.\n",
268
- "genomix-research 0.1.0 requires mdurl==0.1.2, which is not installed.\n",
269
- "genomix-research 0.1.0 requires mistune==3.1.3, which is not installed.\n",
270
- "genomix-research 0.1.0 requires ml-dtypes==0.5.1, which is not installed.\n",
271
- "genomix-research 0.1.0 requires monotonic==1.6, which is not installed.\n",
272
- "genomix-research 0.1.0 requires more-itertools==10.6.0, which is not installed.\n",
273
- "genomix-research 0.1.0 requires msgpack==1.1.0, which is not installed.\n",
274
- "genomix-research 0.1.0 requires namex==0.0.8, which is not installed.\n",
275
- "genomix-research 0.1.0 requires natsort==8.4.0, which is not installed.\n",
276
- "genomix-research 0.1.0 requires nbclient==0.10.2, which is not installed.\n",
277
- "genomix-research 0.1.0 requires nbconvert==7.16.6, which is not installed.\n",
278
- "genomix-research 0.1.0 requires nbformat==5.10.4, which is not installed.\n",
279
- "genomix-research 0.1.0 requires ncls==0.0.68, which is not installed.\n",
280
- "genomix-research 0.1.0 requires neptune==1.13.0, which is not installed.\n",
281
- "genomix-research 0.1.0 requires nodeenv==1.9.1, which is not installed.\n",
282
- "genomix-research 0.1.0 requires notebook==7.3.3, which is not installed.\n",
283
- "genomix-research 0.1.0 requires notebook-shim==0.2.4, which is not installed.\n",
284
- "genomix-research 0.1.0 requires oauthlib==3.2.2, which is not installed.\n",
285
- "genomix-research 0.1.0 requires omegaconf==2.3.0, which is not installed.\n",
286
- "genomix-research 0.1.0 requires opt-einsum==3.4.0, which is not installed.\n",
287
- "genomix-research 0.1.0 requires optax==0.2.4, which is not installed.\n",
288
- "genomix-research 0.1.0 requires optree==0.14.1, which is not installed.\n",
289
- "genomix-research 0.1.0 requires orbax==0.1.9, which is not installed.\n",
290
- "genomix-research 0.1.0 requires orbax-checkpoint==0.11.8, which is not installed.\n",
291
- "genomix-research 0.1.0 requires overrides==7.7.0, which is not installed.\n",
292
- "genomix-research 0.1.0 requires pandocfilters==1.5.1, which is not installed.\n",
293
- "genomix-research 0.1.0 requires pluggy==1.5.0, which is not installed.\n",
294
- "genomix-research 0.1.0 requires pre-commit==4.1.0, which is not installed.\n",
295
- "genomix-research 0.1.0 requires prometheus-client==0.21.1, which is not installed.\n",
296
- "genomix-research 0.1.0 requires proto-plus==1.26.0, which is not installed.\n",
297
- "genomix-research 0.1.0 requires protobuf==4.25.7, which is not installed.\n",
298
- "genomix-research 0.1.0 requires pyasn1==0.6.1, which is not installed.\n",
299
- "genomix-research 0.1.0 requires pyasn1-modules==0.4.1, which is not installed.\n",
300
- "genomix-research 0.1.0 requires pycparser==2.22, which is not installed.\n",
301
- "genomix-research 0.1.0 requires pyjwt==2.10.1, which is not installed.\n",
302
- "genomix-research 0.1.0 requires pyranges==0.1.4, which is not installed.\n",
303
- "genomix-research 0.1.0 requires pysam==0.23.0, which is not installed.\n",
304
- "genomix-research 0.1.0 requires pytest==8.3.5, which is not installed.\n",
305
- "genomix-research 0.1.0 requires pytest-randomly>=3.16.0, which is not installed.\n",
306
- "genomix-research 0.1.0 requires pytest-split>=0.10.0, which is not installed.\n",
307
- "genomix-research 0.1.0 requires python-json-logger==3.3.0, which is not installed.\n",
308
- "genomix-research 0.1.0 requires ray[default]>=2.49.0, which is not installed.\n",
309
- "genomix-research 0.1.0 requires referencing==0.36.2, which is not installed.\n",
310
- "genomix-research 0.1.0 requires requests-oauthlib==2.0.0, which is not installed.\n",
311
- "genomix-research 0.1.0 requires rfc3339-validator==0.1.4, which is not installed.\n",
312
- "genomix-research 0.1.0 requires rfc3986-validator==0.1.1, which is not installed.\n",
313
- "genomix-research 0.1.0 requires rfc3987==1.3.8, which is not installed.\n",
314
- "genomix-research 0.1.0 requires rich==13.9.4, which is not installed.\n",
315
- "genomix-research 0.1.0 requires rpds-py==0.23.1, which is not installed.\n",
316
- "genomix-research 0.1.0 requires rsa==4.7, which is not installed.\n",
317
- "genomix-research 0.1.0 requires s3fs==2025.3.0, which is not installed.\n",
318
- "genomix-research 0.1.0 requires s3transfer==0.11.3, which is not installed.\n",
319
- "genomix-research 0.1.0 requires scikit-learn>=1.6.1, which is not installed.\n",
320
- "genomix-research 0.1.0 requires scipy==1.15.2, which is not installed.\n",
321
- "genomix-research 0.1.0 requires seaborn>=0.13.2, which is not installed.\n",
322
- "genomix-research 0.1.0 requires send2trash==1.8.3, which is not installed.\n",
323
- "genomix-research 0.1.0 requires simplejson==3.20.1, which is not installed.\n",
324
- "genomix-research 0.1.0 requires smmap==5.0.2, which is not installed.\n",
325
- "genomix-research 0.1.0 requires sniffio==1.3.1, which is not installed.\n",
326
- "genomix-research 0.1.0 requires sorted-nearest==0.0.39, which is not installed.\n",
327
- "genomix-research 0.1.0 requires soupsieve==2.6, which is not installed.\n",
328
- "genomix-research 0.1.0 requires swagger-spec-validator==3.0.4, which is not installed.\n",
329
- "genomix-research 0.1.0 requires tabulate==0.9.0, which is not installed.\n",
330
- "genomix-research 0.1.0 requires tenacity>=9.1.2, which is not installed.\n",
331
- "genomix-research 0.1.0 requires tensorboard==2.19.0, which is not installed.\n",
332
- "genomix-research 0.1.0 requires tensorboard-data-server==0.7.2, which is not installed.\n",
333
- "genomix-research 0.1.0 requires tensorboard-plugin-profile==2.20.6, which is not installed.\n",
334
- "genomix-research 0.1.0 requires tensorflow==2.19.0, which is not installed.\n",
335
- "genomix-research 0.1.0 requires tensorflow-io==0.37.1, which is not installed.\n",
336
- "genomix-research 0.1.0 requires tensorflow-io-gcs-filesystem==0.37.1, which is not installed.\n",
337
- "genomix-research 0.1.0 requires tensorstore==0.1.71, which is not installed.\n",
338
- "genomix-research 0.1.0 requires termcolor==2.5.0, which is not installed.\n",
339
- "genomix-research 0.1.0 requires terminado==0.18.1, which is not installed.\n",
340
- "genomix-research 0.1.0 requires tinycss2==1.4.0, which is not installed.\n",
341
- "genomix-research 0.1.0 requires toolz==1.0.0, which is not installed.\n",
342
- "genomix-research 0.1.0 requires treescope==0.1.9, which is not installed.\n",
343
- "genomix-research 0.1.0 requires types-python-dateutil==2.9.0.20241206, which is not installed.\n",
344
- "genomix-research 0.1.0 requires umap-learn>=0.5.9.post2, which is not installed.\n",
345
- "genomix-research 0.1.0 requires uri-template==1.3.0, which is not installed.\n",
346
- "genomix-research 0.1.0 requires uritemplate==4.1.1, which is not installed.\n",
347
- "genomix-research 0.1.0 requires virtualenv==20.29.3, which is not installed.\n",
348
- "genomix-research 0.1.0 requires wadler-lindig==0.1.4, which is not installed.\n",
349
- "genomix-research 0.1.0 requires waffle==0.4.0, which is not installed.\n",
350
- "genomix-research 0.1.0 requires webcolors==24.11.1, which is not installed.\n",
351
- "genomix-research 0.1.0 requires webencodings==0.5.1, which is not installed.\n",
352
- "genomix-research 0.1.0 requires websocket-client==1.8.0, which is not installed.\n",
353
- "genomix-research 0.1.0 requires werkzeug==3.1.3, which is not installed.\n",
354
- "genomix-research 0.1.0 requires wheel==0.45.1, which is not installed.\n",
355
- "genomix-research 0.1.0 requires wrapt==1.17.2, which is not installed.\n",
356
- "genomix-research 0.1.0 requires zipp==3.21.0, which is not installed.\n",
357
- "genomix-research 0.1.0 requires aiohappyeyeballs==2.5.0, but you have aiohappyeyeballs 2.6.1 which is incompatible.\n",
358
- "genomix-research 0.1.0 requires aiohttp==3.11.13, but you have aiohttp 3.13.2 which is incompatible.\n",
359
- "genomix-research 0.1.0 requires aiosignal==1.3.2, but you have aiosignal 1.4.0 which is incompatible.\n",
360
- "genomix-research 0.1.0 requires anyio==4.9.0, but you have anyio 4.12.0 which is incompatible.\n",
361
- "genomix-research 0.1.0 requires asttokens==3.0.0, but you have asttokens 3.0.1 which is incompatible.\n",
362
- "genomix-research 0.1.0 requires attrs==25.1.0, but you have attrs 25.4.0 which is incompatible.\n",
363
- "genomix-research 0.1.0 requires certifi==2025.1.31, but you have certifi 2025.11.12 which is incompatible.\n",
364
- "genomix-research 0.1.0 requires charset-normalizer==3.4.1, but you have charset-normalizer 3.4.4 which is incompatible.\n",
365
- "genomix-research 0.1.0 requires comm==0.2.2, but you have comm 0.2.3 which is incompatible.\n",
366
- "genomix-research 0.1.0 requires debugpy==1.8.13, but you have debugpy 1.8.17 which is incompatible.\n",
367
- "genomix-research 0.1.0 requires executing==2.2.0, but you have executing 2.2.1 which is incompatible.\n",
368
- "genomix-research 0.1.0 requires filelock==3.17.0, but you have filelock 3.20.0 which is incompatible.\n",
369
- "genomix-research 0.1.0 requires frozenlist==1.5.0, but you have frozenlist 1.8.0 which is incompatible.\n",
370
- "genomix-research 0.1.0 requires fsspec==2025.3.0, but you have fsspec 2025.10.0 which is incompatible.\n",
371
- "genomix-research 0.1.0 requires h11==0.14.0, but you have h11 0.16.0 which is incompatible.\n",
372
- "genomix-research 0.1.0 requires httpcore==1.0.7, but you have httpcore 1.0.9 which is incompatible.\n",
373
- "genomix-research 0.1.0 requires idna==3.10, but you have idna 3.11 which is incompatible.\n",
374
- "genomix-research 0.1.0 requires ipykernel==6.29.5, but you have ipykernel 7.1.0 which is incompatible.\n",
375
- "genomix-research 0.1.0 requires ipython==9.0.2, but you have ipython 9.8.0 which is incompatible.\n",
376
- "genomix-research 0.1.0 requires ipywidgets==8.1.5, but you have ipywidgets 8.1.8 which is incompatible.\n",
377
- "genomix-research 0.1.0 requires jupyter-core==5.7.2, but you have jupyter-core 5.9.1 which is incompatible.\n",
378
- "genomix-research 0.1.0 requires jupyterlab-widgets==3.0.13, but you have jupyterlab-widgets 3.0.16 which is incompatible.\n",
379
- "genomix-research 0.1.0 requires markupsafe==3.0.2, but you have markupsafe 3.0.3 which is incompatible.\n",
380
- "genomix-research 0.1.0 requires matplotlib-inline==0.1.7, but you have matplotlib-inline 0.2.1 which is incompatible.\n",
381
- "genomix-research 0.1.0 requires multidict==6.1.0, but you have multidict 6.7.0 which is incompatible.\n",
382
- "genomix-research 0.1.0 requires numpy==2.1.3, but you have numpy 2.3.5 which is incompatible.\n",
383
- "genomix-research 0.1.0 requires packaging==24.2, but you have packaging 25.0 which is incompatible.\n",
384
- "genomix-research 0.1.0 requires pandas==2.2.3, but you have pandas 2.3.3 which is incompatible.\n",
385
- "genomix-research 0.1.0 requires parso==0.8.4, but you have parso 0.8.5 which is incompatible.\n",
386
- "genomix-research 0.1.0 requires pillow==11.1.0, but you have pillow 12.0.0 which is incompatible.\n",
387
- "genomix-research 0.1.0 requires platformdirs==4.3.6, but you have platformdirs 4.5.1 which is incompatible.\n",
388
- "genomix-research 0.1.0 requires prompt-toolkit==3.0.50, but you have prompt-toolkit 3.0.52 which is incompatible.\n",
389
- "genomix-research 0.1.0 requires propcache==0.3.0, but you have propcache 0.4.1 which is incompatible.\n",
390
- "genomix-research 0.1.0 requires psutil==7.0.0, but you have psutil 7.1.3 which is incompatible.\n",
391
- "genomix-research 0.1.0 requires pygments==2.19.1, but you have pygments 2.19.2 which is incompatible.\n",
392
- "genomix-research 0.1.0 requires pyparsing==3.2.1, but you have pyparsing 3.2.5 which is incompatible.\n",
393
- "genomix-research 0.1.0 requires pytz==2025.1, but you have pytz 2025.2 which is incompatible.\n",
394
- "genomix-research 0.1.0 requires pyyaml==6.0.2, but you have pyyaml 6.0.3 which is incompatible.\n",
395
- "genomix-research 0.1.0 requires pyzmq==26.3.0, but you have pyzmq 27.1.0 which is incompatible.\n",
396
- "genomix-research 0.1.0 requires regex==2024.11.6, but you have regex 2025.11.3 which is incompatible.\n",
397
- "genomix-research 0.1.0 requires requests==2.32.3, but you have requests 2.32.5 which is incompatible.\n",
398
- "genomix-research 0.1.0 requires setuptools==77.0.1, but you have setuptools 80.9.0 which is incompatible.\n",
399
- "genomix-research 0.1.0 requires tornado==6.4.2, but you have tornado 6.5.2 which is incompatible.\n",
400
- "genomix-research 0.1.0 requires typing-extensions==4.12.2, but you have typing-extensions 4.15.0 which is incompatible.\n",
401
- "genomix-research 0.1.0 requires tzdata==2025.1, but you have tzdata 2025.3 which is incompatible.\n",
402
- "genomix-research 0.1.0 requires urllib3==2.3.0, but you have urllib3 2.6.1 which is incompatible.\n",
403
- "genomix-research 0.1.0 requires wcwidth==0.2.13, but you have wcwidth 0.2.14 which is incompatible.\n",
404
- "genomix-research 0.1.0 requires widgetsnbextension==4.0.13, but you have widgetsnbextension 4.0.15 which is incompatible.\n",
405
- "genomix-research 0.1.0 requires yarl==1.18.3, but you have yarl 1.22.0 which is incompatible.\u001b[0m\u001b[31m\n",
406
- "\u001b[0mSuccessfully installed aiohappyeyeballs-2.6.1 aiohttp-3.13.2 aiosignal-1.4.0 anyio-4.12.0 attrs-25.4.0 datasets-4.4.1 dill-0.4.0 frozenlist-1.8.0 fsspec-2025.10.0 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 multidict-6.7.0 multiprocess-0.70.18 pandas-2.3.3 propcache-0.4.1 pyarrow-22.0.0 pytz-2025.2 tzdata-2025.3 xxhash-3.6.0 yarl-1.22.0\n"
407
- ]
408
- }
409
- ],
410
  "source": [
411
  "# Install dependencies\n",
412
  "!pip install datasets transformers torchmetrics plotly "
@@ -474,7 +114,7 @@
474
  },
475
  {
476
  "cell_type": "code",
477
- "execution_count": 13,
478
  "metadata": {},
479
  "outputs": [
480
  {
@@ -495,7 +135,6 @@
495
  " \"data_cache_dir\": \"./data\",\n",
496
  " \"fasta_url\": \"https://hgdownload.gi.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz\",\n",
497
  " \"bigwig_url_list\": [\n",
498
- " # \"https://www.encodeproject.org/files/ENCFF884LDL/@@download/ENCFF884LDL.bigWig\",\n",
499
  " \"https://www.encodeproject.org/files/ENCFF055QKS/@@download/ENCFF055QKS.bigWig\",\n",
500
  " \"https://www.encodeproject.org/files/ENCFF214GOQ/@@download/ENCFF214GOQ.bigWig\",\n",
501
  " \"https://www.encodeproject.org/files/ENCFF592NIB/@@download/ENCFF592NIB.bigWig\",\n",
@@ -672,45 +311,9 @@
672
  },
673
  {
674
  "cell_type": "code",
675
- "execution_count": 14,
676
  "metadata": {},
677
- "outputs": [
678
- {
679
- "name": "stdout",
680
- "output_type": "stream",
681
- "text": [
682
- "Loading dataset from InstaDeepAI/bigwig_tracks...\n"
683
- ]
684
- },
685
- {
686
- "data": {
687
- "application/vnd.jupyter.widget-view+json": {
688
- "model_id": "d9e36ca0c8e544339833c04f68f485aa",
689
- "version_major": 2,
690
- "version_minor": 0
691
- },
692
- "text/plain": [
693
- "README.md: 0%| | 0.00/4.24k [00:00<?, ?B/s]"
694
- ]
695
- },
696
- "metadata": {},
697
- "output_type": "display_data"
698
- },
699
- {
700
- "ename": "FileNotFoundError",
701
- "evalue": "Couldn't find any data file at /home/y-bornachot/ntv3/notebooks/InstaDeepAI/bigwig_tracks. Couldn't find 'InstaDeepAI/bigwig_tracks' on the Hugging Face Hub either: FileNotFoundError: Unable to find 'hf://datasets/InstaDeepAI/bigwig_tracks@7fe68eaafda66223c3fe392f5fa2ad81173047a1/./data/chr1' with any supported extension ['.csv', '.tsv', '.json', '.jsonl', '.ndjson', '.parquet', '.geoparquet', '.gpq', '.arrow', '.txt', '.tar', '.xml', '.hdf5', '.h5', '.blp', '.bmp', '.dib', '.bufr', '.cur', '.pcx', '.dcx', '.dds', '.ps', '.eps', '.fit', '.fits', '.fli', '.flc', '.ftc', '.ftu', '.gbr', '.gif', '.grib', '.png', '.apng', '.jp2', '.j2k', '.jpc', '.jpf', '.jpx', '.j2c', '.icns', '.ico', '.im', '.iim', '.tif', '.tiff', '.jfif', '.jpe', '.jpg', '.jpeg', '.mpg', '.mpeg', '.msp', '.pcd', '.pxr', '.pbm', '.pgm', '.ppm', '.pnm', '.psd', '.bw', '.rgb', '.rgba', '.sgi', '.ras', '.tga', '.icb', '.vda', '.vst', '.webp', '.wmf', '.emf', '.xbm', '.xpm', '.BLP', '.BMP', '.DIB', '.BUFR', '.CUR', '.PCX', '.DCX', '.DDS', '.PS', '.EPS', '.FIT', '.FITS', '.FLI', '.FLC', '.FTC', '.FTU', '.GBR', '.GIF', '.GRIB', '.PNG', '.APNG', '.JP2', '.J2K', '.JPC', '.JPF', '.JPX', '.J2C', '.ICNS', '.ICO', '.IM', '.IIM', '.TIF', '.TIFF', '.JFIF', '.JPE', '.JPG', '.JPEG', '.MPG', '.MPEG', '.MSP', '.PCD', '.PXR', '.PBM', '.PGM', '.PPM', '.PNM', '.PSD', '.BW', '.RGB', '.RGBA', '.SGI', '.RAS', '.TGA', '.ICB', '.VDA', '.VST', '.WEBP', '.WMF', '.EMF', '.XBM', '.XPM', '.aiff', '.au', '.avr', '.caf', '.flac', '.htk', '.svx', '.mat4', '.mat5', '.mpc2k', '.ogg', '.paf', '.pvf', '.raw', '.rf64', '.sd2', '.sds', '.ircam', '.voc', '.w64', '.wav', '.nist', '.wavex', '.wve', '.xi', '.mp3', '.opus', '.3gp', '.3g2', '.avi', '.asf', '.flv', '.mp4', '.mov', '.m4v', '.mkv', '.webm', '.f4v', '.wmv', '.wma', '.ogm', '.mxf', '.nut', '.AIFF', '.AU', '.AVR', '.CAF', '.FLAC', '.HTK', '.SVX', '.MAT4', '.MAT5', '.MPC2K', '.OGG', '.PAF', '.PVF', '.RAW', '.RF64', '.SD2', '.SDS', '.IRCAM', '.VOC', '.W64', '.WAV', '.NIST', '.WAVEX', '.WVE', '.XI', '.MP3', '.OPUS', '.3GP', '.3G2', '.AVI', '.ASF', '.FLV', '.MP4', '.MOV', '.M4V', '.MKV', '.WEBM', '.F4V', '.WMV', '.WMA', '.OGM', '.MXF', '.NUT', '.pdf', '.PDF', '.nii', '.nii.gz', '.NII', '.NII.GZ', '.zip']",
702
- "output_type": "error",
703
- "traceback": [
704
- "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
705
- "\u001b[31mFileNotFoundError\u001b[39m Traceback (most recent call last)",
706
- "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[14]\u001b[39m\u001b[32m, line 16\u001b[39m\n\u001b[32m 9\u001b[39m num_samples = {\n\u001b[32m 10\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mtrain\u001b[39m\u001b[33m\"\u001b[39m: config[\u001b[33m\"\u001b[39m\u001b[33mnum_steps_training\u001b[39m\u001b[33m\"\u001b[39m] * config[\u001b[33m\"\u001b[39m\u001b[33mbatch_size\u001b[39m\u001b[33m\"\u001b[39m],\n\u001b[32m 11\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mval\u001b[39m\u001b[33m\"\u001b[39m: config[\u001b[33m\"\u001b[39m\u001b[33mnum_validation_samples\u001b[39m\u001b[33m\"\u001b[39m],\n\u001b[32m 12\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mtest\u001b[39m\u001b[33m\"\u001b[39m: config[\u001b[33m\"\u001b[39m\u001b[33mnum_test_samples\u001b[39m\u001b[33m\"\u001b[39m],\n\u001b[32m 13\u001b[39m }\n\u001b[32m 15\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mLoading dataset from \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mconfig[\u001b[33m'\u001b[39m\u001b[33mdataset_name\u001b[39m\u001b[33m'\u001b[39m]\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m...\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m---> \u001b[39m\u001b[32m16\u001b[39m dataset = \u001b[43mload_dataset\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 17\u001b[39m \u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mdataset_name\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 18\u001b[39m \u001b[43m \u001b[49m\u001b[43mdata_files\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchrom_splits\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 19\u001b[39m \u001b[43m \u001b[49m\u001b[43mnum_samples\u001b[49m\u001b[43m=\u001b[49m\u001b[43mnum_samples\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 20\u001b[39m \u001b[43m \u001b[49m\u001b[43mfasta_url\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mfasta_url\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 21\u001b[39m \u001b[43m \u001b[49m\u001b[43mbigwig_urls\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mbigwig_url_list\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 22\u001b[39m \u001b[43m \u001b[49m\u001b[43msequence_length\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43msequence_length\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 23\u001b[39m \u001b[43m \u001b[49m\u001b[43mdata_dir\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mdata_cache_dir\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 24\u001b[39m \u001b[43m)\u001b[49m\n",
707
- "\u001b[36mFile \u001b[39m\u001b[32m~/venvs/ntv3-env/lib/python3.12/site-packages/datasets/load.py:1397\u001b[39m, in \u001b[36mload_dataset\u001b[39m\u001b[34m(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, keep_in_memory, save_infos, revision, token, streaming, num_proc, storage_options, **config_kwargs)\u001b[39m\n\u001b[32m 1392\u001b[39m verification_mode = VerificationMode(\n\u001b[32m 1393\u001b[39m (verification_mode \u001b[38;5;129;01mor\u001b[39;00m VerificationMode.BASIC_CHECKS) \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m save_infos \u001b[38;5;28;01melse\u001b[39;00m VerificationMode.ALL_CHECKS\n\u001b[32m 1394\u001b[39m )\n\u001b[32m 1396\u001b[39m \u001b[38;5;66;03m# Create a dataset builder\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1397\u001b[39m builder_instance = \u001b[43mload_dataset_builder\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 1398\u001b[39m \u001b[43m \u001b[49m\u001b[43mpath\u001b[49m\u001b[43m=\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1399\u001b[39m \u001b[43m \u001b[49m\u001b[43mname\u001b[49m\u001b[43m=\u001b[49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1400\u001b[39m \u001b[43m \u001b[49m\u001b[43mdata_dir\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdata_dir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1401\u001b[39m \u001b[43m \u001b[49m\u001b[43mdata_files\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdata_files\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1402\u001b[39m \u001b[43m \u001b[49m\u001b[43mcache_dir\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcache_dir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1403\u001b[39m \u001b[43m \u001b[49m\u001b[43mfeatures\u001b[49m\u001b[43m=\u001b[49m\u001b[43mfeatures\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1404\u001b[39m \u001b[43m \u001b[49m\u001b[43mdownload_config\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdownload_config\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1405\u001b[39m \u001b[43m \u001b[49m\u001b[43mdownload_mode\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdownload_mode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1406\u001b[39m \u001b[43m \u001b[49m\u001b[43mrevision\u001b[49m\u001b[43m=\u001b[49m\u001b[43mrevision\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1407\u001b[39m \u001b[43m \u001b[49m\u001b[43mtoken\u001b[49m\u001b[43m=\u001b[49m\u001b[43mtoken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1408\u001b[39m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m=\u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1409\u001b[39m \u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mconfig_kwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1410\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1412\u001b[39m \u001b[38;5;66;03m# Return iterable dataset in case of streaming\u001b[39;00m\n\u001b[32m 1413\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m streaming:\n",
708
- "\u001b[36mFile \u001b[39m\u001b[32m~/venvs/ntv3-env/lib/python3.12/site-packages/datasets/load.py:1137\u001b[39m, in \u001b[36mload_dataset_builder\u001b[39m\u001b[34m(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, storage_options, **config_kwargs)\u001b[39m\n\u001b[32m 1135\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m features \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m 1136\u001b[39m features = _fix_for_backward_compatible_features(features)\n\u001b[32m-> \u001b[39m\u001b[32m1137\u001b[39m dataset_module = \u001b[43mdataset_module_factory\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 1138\u001b[39m \u001b[43m \u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1139\u001b[39m \u001b[43m \u001b[49m\u001b[43mrevision\u001b[49m\u001b[43m=\u001b[49m\u001b[43mrevision\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1140\u001b[39m \u001b[43m \u001b[49m\u001b[43mdownload_config\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdownload_config\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1141\u001b[39m \u001b[43m \u001b[49m\u001b[43mdownload_mode\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdownload_mode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1142\u001b[39m \u001b[43m \u001b[49m\u001b[43mdata_dir\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdata_dir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1143\u001b[39m \u001b[43m \u001b[49m\u001b[43mdata_files\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdata_files\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1144\u001b[39m \u001b[43m \u001b[49m\u001b[43mcache_dir\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcache_dir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1145\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1146\u001b[39m \u001b[38;5;66;03m# Get dataset builder class\u001b[39;00m\n\u001b[32m 1147\u001b[39m builder_kwargs = dataset_module.builder_kwargs\n",
709
- "\u001b[36mFile \u001b[39m\u001b[32m~/venvs/ntv3-env/lib/python3.12/site-packages/datasets/load.py:1032\u001b[39m, in \u001b[36mdataset_module_factory\u001b[39m\u001b[34m(path, revision, download_config, download_mode, data_dir, data_files, cache_dir, **download_kwargs)\u001b[39m\n\u001b[32m 1030\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m e1 \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m 1031\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(e1, \u001b[38;5;167;01mFileNotFoundError\u001b[39;00m):\n\u001b[32m-> \u001b[39m\u001b[32m1032\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mFileNotFoundError\u001b[39;00m(\n\u001b[32m 1033\u001b[39m \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mCouldn\u001b[39m\u001b[33m'\u001b[39m\u001b[33mt find any data file at \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mrelative_to_absolute_path(path)\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m. \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 1034\u001b[39m \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mCouldn\u001b[39m\u001b[33m'\u001b[39m\u001b[33mt find \u001b[39m\u001b[33m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mpath\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m'\u001b[39m\u001b[33m on the Hugging Face Hub either: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mtype\u001b[39m(e1).\u001b[34m__name__\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00me1\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m\n\u001b[32m 1035\u001b[39m ) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m 1036\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m e1 \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m 1037\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n",
710
- "\u001b[31mFileNotFoundError\u001b[39m: Couldn't find any data file at /home/y-bornachot/ntv3/notebooks/InstaDeepAI/bigwig_tracks. Couldn't find 'InstaDeepAI/bigwig_tracks' on the Hugging Face Hub either: FileNotFoundError: Unable to find 'hf://datasets/InstaDeepAI/bigwig_tracks@7fe68eaafda66223c3fe392f5fa2ad81173047a1/./data/chr1' with any supported extension ['.csv', '.tsv', '.json', '.jsonl', '.ndjson', '.parquet', '.geoparquet', '.gpq', '.arrow', '.txt', '.tar', '.xml', '.hdf5', '.h5', '.blp', '.bmp', '.dib', '.bufr', '.cur', '.pcx', '.dcx', '.dds', '.ps', '.eps', '.fit', '.fits', '.fli', '.flc', '.ftc', '.ftu', '.gbr', '.gif', '.grib', '.png', '.apng', '.jp2', '.j2k', '.jpc', '.jpf', '.jpx', '.j2c', '.icns', '.ico', '.im', '.iim', '.tif', '.tiff', '.jfif', '.jpe', '.jpg', '.jpeg', '.mpg', '.mpeg', '.msp', '.pcd', '.pxr', '.pbm', '.pgm', '.ppm', '.pnm', '.psd', '.bw', '.rgb', '.rgba', '.sgi', '.ras', '.tga', '.icb', '.vda', '.vst', '.webp', '.wmf', '.emf', '.xbm', '.xpm', '.BLP', '.BMP', '.DIB', '.BUFR', '.CUR', '.PCX', '.DCX', '.DDS', '.PS', '.EPS', '.FIT', '.FITS', '.FLI', '.FLC', '.FTC', '.FTU', '.GBR', '.GIF', '.GRIB', '.PNG', '.APNG', '.JP2', '.J2K', '.JPC', '.JPF', '.JPX', '.J2C', '.ICNS', '.ICO', '.IM', '.IIM', '.TIF', '.TIFF', '.JFIF', '.JPE', '.JPG', '.JPEG', '.MPG', '.MPEG', '.MSP', '.PCD', '.PXR', '.PBM', '.PGM', '.PPM', '.PNM', '.PSD', '.BW', '.RGB', '.RGBA', '.SGI', '.RAS', '.TGA', '.ICB', '.VDA', '.VST', '.WEBP', '.WMF', '.EMF', '.XBM', '.XPM', '.aiff', '.au', '.avr', '.caf', '.flac', '.htk', '.svx', '.mat4', '.mat5', '.mpc2k', '.ogg', '.paf', '.pvf', '.raw', '.rf64', '.sd2', '.sds', '.ircam', '.voc', '.w64', '.wav', '.nist', '.wavex', '.wve', '.xi', '.mp3', '.opus', '.3gp', '.3g2', '.avi', '.asf', '.flv', '.mp4', '.mov', '.m4v', '.mkv', '.webm', '.f4v', '.wmv', '.wma', '.ogm', '.mxf', '.nut', '.AIFF', '.AU', '.AVR', '.CAF', '.FLAC', '.HTK', '.SVX', '.MAT4', '.MAT5', '.MPC2K', '.OGG', '.PAF', '.PVF', '.RAW', '.RF64', '.SD2', '.SDS', '.IRCAM', '.VOC', '.W64', '.WAV', '.NIST', '.WAVEX', '.WVE', '.XI', '.MP3', '.OPUS', '.3GP', '.3G2', '.AVI', '.ASF', '.FLV', '.MP4', '.MOV', '.M4V', '.MKV', '.WEBM', '.F4V', '.WMV', '.WMA', '.OGM', '.MXF', '.NUT', '.pdf', '.PDF', '.nii', '.nii.gz', '.NII', '.NII.GZ', '.zip']"
711
- ]
712
- }
713
- ],
714
  "source": [
715
  "# Chromosomes split definition\n",
716
  "chrom_splits = {\n",
 
8
  "\n",
9
  "This notebook demonstrates a **simplified fine-tuning setup** that enables training of a pre-trained Nucleotide Transformer v3 (NTv3) model to predict BigWig signal tracks directly from DNA sequences. The streamlined approach leverages a pre-trained NTv3 backbone as a feature extractor and adds a custom prediction head that outputs single-nucleotide resolution signal values for various genomic tracks (e.g., ChIP-seq, ATAC-seq, RNA-seq).\n",
10
  "\n",
11
+ "**⚡ Key Advantage**: This simplified pipeline achieves **close performance to more complex training approaches** while enabling **fast fine-tuning**. The training speed benefits from the efficient NTv3 model architecture and depends on your hardware capabilities (GPU acceleration and multi-worker data loading significantly reduce training time). With NTv3 models, meaningful Pearson correlations can typically be reached within ~10minutes of training on a 32kb functional tracks prediction task. \n",
12
  "\n",
13
+ "While this notebook currently focuses on NTv3 models, the pipeline structure can be extended to work with other foundation models. The setup is designed for rapid experimentation and iteration, making it ideal for adapting pre-trained models to your specific genomic tracks or experimental conditions without the overhead of complex distributed training infrastructure.\n",
14
+ "\n",
15
+ "**🔧 Main Simplifications**: Compared to the full supervised tracks prediction pipeline, this notebook simplifies several aspects to enable faster iteration:\n",
16
  "\n",
17
  "- **Data splits**: Uses simple chromosome-based train/val/test splits (e.g., assigning entire chromosomes to each split) instead of more complex region-based splits\n",
18
  "- **Random sequence sampling**: The dataset randomly samples sequences from chromosomes/regions on-the-fly, rather than using pre-computed sliding windows\n",
 
44
  },
45
  {
46
  "cell_type": "code",
47
+ "execution_count": null,
48
  "metadata": {},
49
+ "outputs": [],
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  "source": [
51
  "# Install dependencies\n",
52
  "!pip install datasets transformers torchmetrics plotly "
 
114
  },
115
  {
116
  "cell_type": "code",
117
+ "execution_count": null,
118
  "metadata": {},
119
  "outputs": [
120
  {
 
135
  " \"data_cache_dir\": \"./data\",\n",
136
  " \"fasta_url\": \"https://hgdownload.gi.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz\",\n",
137
  " \"bigwig_url_list\": [\n",
 
138
  " \"https://www.encodeproject.org/files/ENCFF055QKS/@@download/ENCFF055QKS.bigWig\",\n",
139
  " \"https://www.encodeproject.org/files/ENCFF214GOQ/@@download/ENCFF214GOQ.bigWig\",\n",
140
  " \"https://www.encodeproject.org/files/ENCFF592NIB/@@download/ENCFF592NIB.bigWig\",\n",
 
311
  },
312
  {
313
  "cell_type": "code",
314
+ "execution_count": null,
315
  "metadata": {},
316
+ "outputs": [],
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
317
  "source": [
318
  "# Chromosomes split definition\n",
319
  "chrom_splits = {\n",