Upload folder using huggingface_hub
#1
by
avantikalal
- opened
- 2_model.ipynb +0 -0
- README.md +41 -3
- model.ckpt +3 -0
- output.log +67 -0
2_model.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README.md
CHANGED
|
@@ -1,3 +1,41 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
# 1. Metadata Block
|
| 3 |
+
license: mit
|
| 4 |
+
library_name: pytorch-lightning
|
| 5 |
+
pipeline_tag: tabular-classification
|
| 6 |
+
tags:
|
| 7 |
+
- biology
|
| 8 |
+
- genomics
|
| 9 |
+
datasets:
|
| 10 |
+
- Genentech/human-atac-catlas-data
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# human-atac-catlas-model
|
| 14 |
+
|
| 15 |
+
## Model Description
|
| 16 |
+
This model is a multi-task binary classifier trained to predict chromatin accessibility across 204 cell types. It was trained by fine-tuning the Enformer model using the `grelu` library on top of the CATlas human enhancer dataset.
|
| 17 |
+
|
| 18 |
+
- **Architecture:** Fine-tuned Enformer
|
| 19 |
+
- **Input:** Genomic sequences (hg38)
|
| 20 |
+
- **Output:** Binary accessibility predictions for 204 cell type tasks.
|
| 21 |
+
|
| 22 |
+
## Repository Content
|
| 23 |
+
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
|
| 24 |
+
2. `2_train.ipynb`: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.
|
| 25 |
+
3. `output.log`: Training logs.
|
| 26 |
+
|
| 27 |
+
## How to use
|
| 28 |
+
To load this model for inference or fine-tuning, use the `grelu` interface:
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
from grelu.lightning import LightningModel
|
| 32 |
+
from huggingface_hub import hf_hub_download
|
| 33 |
+
|
| 34 |
+
ckpt_path = hf_hub_download(
|
| 35 |
+
repo_id="Genentech/human-atac-catlas-model",
|
| 36 |
+
filename="model.ckpt"
|
| 37 |
+
)
|
| 38 |
+
|
| 39 |
+
model = LightningModel.load_from_checkpoint(ckpt_path)
|
| 40 |
+
model.eval()
|
| 41 |
+
```
|
model.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c175cb8c11452062a6892b668774fe75fe18913454ea39368d81ebb784b213aa
|
| 3 |
+
size 324860980
|
output.log
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[34m[1mwandb[0m: 1 of 1 files downloaded.
|
| 2 |
+
2230362 163688 185550
|
| 3 |
+
/opt/conda/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
|
| 4 |
+
from .autonotebook import tqdm as notebook_tqdm
|
| 5 |
+
Sequences will be extracted from columns ['chrom', 'start', 'end']
|
| 6 |
+
Labels are being treated as class names for multiclass classification.
|
| 7 |
+
Sequences will be extracted from columns ['chrom', 'start', 'end']
|
| 8 |
+
Labels are being treated as class names for multiclass classification.
|
| 9 |
+
2230362 163688
|
| 10 |
+
(2230362, 16, 1)
|
| 11 |
+
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
|
| 12 |
+
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
|
| 13 |
+
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
|
| 14 |
+
[34m[1mwandb[0m: [33mWARNING[0m Calling wandb.login() after wandb.init() has no effect.
|
| 15 |
+
[34m[1mwandb[0m: Downloading large artifact human_state_dict:latest, 939.29MB. 1 files...
|
| 16 |
+
[34m[1mwandb[0m: 1 of 1 files downloaded.
|
| 17 |
+
Done. 0:0:0.7
|
| 18 |
+
/opt/conda/lib/python3.11/site-packages/grelu/model/models.py:771: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
|
| 19 |
+
state_dict = torch.load(Path(d) / "human.h5")
|
| 20 |
+
GPU available: True (cuda), used: True
|
| 21 |
+
TPU available: False, using: 0 TPU cores
|
| 22 |
+
HPU available: False, using: 0 HPUs
|
| 23 |
+
/opt/conda/lib/python3.11/site-packages/pytorch_lightning/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
|
| 24 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
|
| 25 |
+
Validation DataLoader 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 320/320 [00:46<00:00, 6.87it/s]
|
| 26 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
|
| 27 |
+
|
| 28 |
+
| Name | Type | Params | Mode
|
| 29 |
+
-----------------------------------------------------------------
|
| 30 |
+
0 | model | EnformerPretrainedModel | 71.5 M | train
|
| 31 |
+
1 | loss | CrossEntropyLoss | 0 | train
|
| 32 |
+
2 | activation | Softmax | 0 | train
|
| 33 |
+
3 | val_metrics | MetricCollection | 0 | train
|
| 34 |
+
4 | test_metrics | MetricCollection | 0 | train
|
| 35 |
+
5 | transform | Identity | 0 | train
|
| 36 |
+
-----------------------------------------------------------------
|
| 37 |
+
71.5 M Trainable params
|
| 38 |
+
0 Non-trainable params
|
| 39 |
+
71.5 M Total params
|
| 40 |
+
285.968 Total estimated model params size (MB)
|
| 41 |
+
239 Modules in train mode
|
| 42 |
+
0 Modules in eval mode
|
| 43 |
+
Epoch 9: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4357/4357 [30:47<00:00, 2.36it/s, v_num=9dx7, train_loss_step=2.090, train_loss_epoch=2.210]
|
| 44 |
+
/opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: No positive samples in targets, true positive value should be meaningless. Returning zero tensor in true positive score
|
| 45 |
+
warnings.warn(*args, **kwargs) # noqa: B028
|
| 46 |
+
/opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: No negative samples in targets, false positive value should be meaningless. Returning zero tensor in false positive score
|
| 47 |
+
warnings.warn(*args, **kwargs) # noqa: B028
|
| 48 |
+
Sequences will be extracted from columns ['chrom', 'start', 'end']
|
| 49 |
+
`Trainer.fit` stopped: `max_epochs=10` reached.
|
| 50 |
+
[34m[1mwandb[0m: [33mWARNING[0m Calling wandb.login() after wandb.init() has no effect.
|
| 51 |
+
[34m[1mwandb[0m: Downloading large artifact human_state_dict:latest, 939.29MB. 1 files...
|
| 52 |
+
[34m[1mwandb[0m: 1 of 1 files downloaded.
|
| 53 |
+
Done. 0:0:0.7
|
| 54 |
+
/opt/conda/lib/python3.11/site-packages/grelu/model/models.py:771: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
|
| 55 |
+
state_dict = torch.load(Path(d) / "human.h5")
|
| 56 |
+
Labels are being treated as class names for multiclass classification.
|
| 57 |
+
GPU available: True (cuda), used: True
|
| 58 |
+
TPU available: False, using: 0 TPU cores
|
| 59 |
+
HPU available: False, using: 0 HPUs
|
| 60 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
|
| 61 |
+
Testing DataLoader 0: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:52<00:00, 6.89it/s]
|
| 62 |
+
GPU available: True (cuda), used: True
|
| 63 |
+
TPU available: False, using: 0 TPU cores
|
| 64 |
+
HPU available: False, using: 0 HPUs
|
| 65 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
|
| 66 |
+
Predicting DataLoader 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 182/182 [01:53<00:00, 1.61it/s]
|
| 67 |
+
[34m[1mwandb[0m: [33mWARNING[0m No relevant files were detected in the specified directory. No code will be logged to your run.
|