Upload folder using huggingface_hub
#1
by
avantikalal
- opened
- 2_train.ipynb +0 -0
- README.md +41 -3
- model.ckpt +3 -0
- output.log +52 -0
2_train.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README.md
CHANGED
|
@@ -1,3 +1,41 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
# 1. Metadata Block
|
| 3 |
+
license: mit
|
| 4 |
+
library_name: pytorch-lightning
|
| 5 |
+
pipeline_tag: tabular-classification
|
| 6 |
+
tags:
|
| 7 |
+
- biology
|
| 8 |
+
- genomics
|
| 9 |
+
datasets:
|
| 10 |
+
- Genentech/human-atac-catlas-data
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# human-atac-catlas-model
|
| 14 |
+
|
| 15 |
+
## Model Description
|
| 16 |
+
This model is a multi-task binary classifier trained to predict chromatin accessibility across 204 cell types. It was trained by fine-tuning the Enformer model using the `grelu` library on top of the CATlas human enhancer dataset.
|
| 17 |
+
|
| 18 |
+
- **Architecture:** Fine-tuned Enformer
|
| 19 |
+
- **Input:** Genomic sequences (hg38)
|
| 20 |
+
- **Output:** Binary accessibility predictions for 204 cell type tasks.
|
| 21 |
+
|
| 22 |
+
## Repository Content
|
| 23 |
+
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
|
| 24 |
+
2. `2_train.ipynb`: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.
|
| 25 |
+
3. `output.log`: Training logs.
|
| 26 |
+
|
| 27 |
+
## How to use
|
| 28 |
+
To load this model for inference or fine-tuning, use the `grelu` interface:
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
from grelu.lightning import LightningModel
|
| 32 |
+
from huggingface_hub import hf_hub_download
|
| 33 |
+
|
| 34 |
+
ckpt_path = hf_hub_download(
|
| 35 |
+
repo_id="Genentech/human-atac-catlas-model",
|
| 36 |
+
filename="model.ckpt"
|
| 37 |
+
)
|
| 38 |
+
|
| 39 |
+
model = LightningModel.load_from_checkpoint(ckpt_path)
|
| 40 |
+
model.eval()
|
| 41 |
+
```
|
model.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:74e9b1d42b3b61eab7574bd62c170e075b3c87132e060390e29296192988fdc3
|
| 3 |
+
size 344440758
|
output.log
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[34m[1mwandb[0m: Downloading large artifact dataset:latest, 179.17MB. 1 files...
|
| 2 |
+
[34m[1mwandb[0m: 1 of 1 files downloaded.
|
| 3 |
+
Done. 0:0:0.3
|
| 4 |
+
/opt/conda/lib/python3.11/site-packages/anndata/_core/aligned_df.py:68: ImplicitModificationWarning: Transforming to str index.
|
| 5 |
+
[34m[1mwandb[0m: [33mWARNING[0m Calling wandb.login() after wandb.init() has no effect.
|
| 6 |
+
[34m[1mwandb[0m: Downloading large artifact human_state_dict:latest, 939.29MB. 1 files...
|
| 7 |
+
[34m[1mwandb[0m: 1 of 1 files downloaded.
|
| 8 |
+
Done. 0:0:0.7
|
| 9 |
+
/opt/conda/lib/python3.11/site-packages/grelu/model/models.py:771: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
|
| 10 |
+
GPU available: True (cuda), used: True
|
| 11 |
+
TPU available: False, using: 0 TPU cores
|
| 12 |
+
HPU available: False, using: 0 HPUs
|
| 13 |
+
/opt/conda/lib/python3.11/site-packages/pytorch_lightning/loggers/wandb.py:397: UserWarning: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
|
| 14 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
|
| 15 |
+
Validation DataLoader 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 24/24 [00:08<00:00, 2.84it/s]
|
| 16 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
|
| 17 |
+
|
| 18 |
+
| Name | Type | Params | Mode
|
| 19 |
+
-----------------------------------------------------------------
|
| 20 |
+
0 | model | EnformerPretrainedModel | 72.1 M | train
|
| 21 |
+
1 | loss | BCEWithLogitsLoss | 0 | train
|
| 22 |
+
2 | val_metrics | MetricCollection | 0 | train
|
| 23 |
+
3 | test_metrics | MetricCollection | 0 | train
|
| 24 |
+
4 | transform | Identity | 0 | train
|
| 25 |
+
-----------------------------------------------------------------
|
| 26 |
+
72.1 M Trainable params
|
| 27 |
+
0 Non-trainable params
|
| 28 |
+
72.1 M Total params
|
| 29 |
+
288.279 Total estimated model params size (MB)
|
| 30 |
+
240 Modules in train mode
|
| 31 |
+
0 Modules in eval mode
|
| 32 |
+
Epoch 9: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 319/319 [03:28<00:00, 1.53it/s, v_num=t24e, train_loss_step=0.118, train_loss_epoch=0.143]
|
| 33 |
+
Testing DataLoader 0: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 284/284 [00:09<00:00, 28.44it/s]
|
| 34 |
+
`Trainer.fit` stopped: `max_epochs=10` reached.
|
| 35 |
+
[34m[1mwandb[0m: [33mWARNING[0m Calling wandb.login() after wandb.init() has no effect.
|
| 36 |
+
[34m[1mwandb[0m: Downloading large artifact human_state_dict:latest, 939.29MB. 1 files...
|
| 37 |
+
[34m[1mwandb[0m: 1 of 1 files downloaded.
|
| 38 |
+
Done. 0:0:0.7
|
| 39 |
+
/opt/conda/lib/python3.11/site-packages/grelu/model/models.py:771: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
|
| 40 |
+
GPU available: True (cuda), used: True
|
| 41 |
+
TPU available: False, using: 0 TPU cores
|
| 42 |
+
HPU available: False, using: 0 HPUs
|
| 43 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
|
| 44 |
+
CPU times: user 13.7 s, sys: 1.66 s, total: 15.4 s
|
| 45 |
+
Wall time: 15.7 s
|
| 46 |
+
/opt/conda/lib/python3.11/site-packages/plotnine/stats/stat_bin.py:109: PlotnineWarning: 'stat_bin()' using 'bins = 19'. Pick better value with 'binwidth'.
|
| 47 |
+
GPU available: True (cuda), used: True
|
| 48 |
+
TPU available: False, using: 0 TPU cores
|
| 49 |
+
HPU available: False, using: 0 HPUs
|
| 50 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
|
| 51 |
+
/opt/conda/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:425: PossibleUserWarning: The 'predict_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=255` in the `DataLoader` to improve performance.
|
| 52 |
+
Predicting DataLoader 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 71/71 [00:04<00:00, 14.21it/s]
|