Upload folder using huggingface_hub

by avantikalal - opened Jan 29

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+3866

-3

Files changed (4) hide show

2_model.ipynb +0 -0
README.md +41 -3
model.ckpt +3 -0
output.log +67 -0

2_model.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md CHANGED Viewed

@@ -1,3 +1,41 @@
----
-license: mit
----

+---
+# 1. Metadata Block
+license: mit
+library_name: pytorch-lightning
+pipeline_tag: tabular-classification
+tags:
+- biology
+- genomics
+datasets:
+- Genentech/human-atac-catlas-data
+---
+# human-atac-catlas-model
+## Model Description
+This model is a multi-task binary classifier trained to predict chromatin accessibility across 204 cell types. It was trained by fine-tuning the Enformer model using the `grelu` library on top of the CATlas human enhancer dataset.
+- **Architecture:** Fine-tuned Enformer
+- **Input:** Genomic sequences (hg38)
+- **Output:** Binary accessibility predictions for 204 cell type tasks.
+## Repository Content
+1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
+2. `2_train.ipynb`: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.
+3. `output.log`: Training logs.
+## How to use
+To load this model for inference or fine-tuning, use the `grelu` interface:
+```python
+from grelu.lightning import LightningModel
+from huggingface_hub import hf_hub_download
+ckpt_path = hf_hub_download(
+    repo_id="Genentech/human-atac-catlas-model",
+    filename="model.ckpt"
+)
+model = LightningModel.load_from_checkpoint(ckpt_path)
+model.eval()
+```

model.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c175cb8c11452062a6892b668774fe75fe18913454ea39368d81ebb784b213aa
+size 324860980

output.log ADDED Viewed

	@@ -0,0 +1,67 @@

+[34m[1mwandb[0m:   1 of 1 files downloaded.
+2230362 163688 185550
+/opt/conda/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
+  from .autonotebook import tqdm as notebook_tqdm
+Sequences will be extracted from columns ['chrom', 'start', 'end']
+Labels are being treated as class names for multiclass classification.
+Sequences will be extracted from columns ['chrom', 'start', 'end']
+Labels are being treated as class names for multiclass classification.
+2230362 163688
+(2230362, 16, 1)
+[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
+ [0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
+ [0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
+[34m[1mwandb[0m: [33mWARNING[0m Calling wandb.login() after wandb.init() has no effect.
+[34m[1mwandb[0m: Downloading large artifact human_state_dict:latest, 939.29MB. 1 files...
+[34m[1mwandb[0m:   1 of 1 files downloaded.
+Done. 0:0:0.7
+/opt/conda/lib/python3.11/site-packages/grelu/model/models.py:771: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
+  state_dict = torch.load(Path(d) / "human.h5")
+GPU available: True (cuda), used: True
+TPU available: False, using: 0 TPU cores
+HPU available: False, using: 0 HPUs
+/opt/conda/lib/python3.11/site-packages/pytorch_lightning/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
+LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
+Validation DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 320/320 [00:46<00:00,  6.87it/s]
+LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
+  | Name         | Type                    | Params | Mode
+-----------------------------------------------------------------
+0 | model        | EnformerPretrainedModel | 71.5 M | train
+1 | loss         | CrossEntropyLoss        | 0      | train
+2 | activation   | Softmax                 | 0      | train
+3 | val_metrics  | MetricCollection        | 0      | train
+4 | test_metrics | MetricCollection        | 0      | train
+5 | transform    | Identity                | 0      | train
+-----------------------------------------------------------------
+71.5 M    Trainable params
+0         Non-trainable params
+71.5 M    Total params
+285.968   Total estimated model params size (MB)
+239       Modules in train mode
+0         Modules in eval mode
+Epoch 9: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4357/4357 [30:47<00:00,  2.36it/s, v_num=9dx7, train_loss_step=2.090, train_loss_epoch=2.210]
+/opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: No positive samples in targets, true positive value should be meaningless. Returning zero tensor in true positive score
+  warnings.warn(*args, **kwargs)  # noqa: B028
+/opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: No negative samples in targets, false positive value should be meaningless. Returning zero tensor in false positive score
+  warnings.warn(*args, **kwargs)  # noqa: B028
+Sequences will be extracted from columns ['chrom', 'start', 'end']
+`Trainer.fit` stopped: `max_epochs=10` reached.
+[34m[1mwandb[0m: [33mWARNING[0m Calling wandb.login() after wandb.init() has no effect.
+[34m[1mwandb[0m: Downloading large artifact human_state_dict:latest, 939.29MB. 1 files...
+[34m[1mwandb[0m:   1 of 1 files downloaded.
+Done. 0:0:0.7
+/opt/conda/lib/python3.11/site-packages/grelu/model/models.py:771: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
+  state_dict = torch.load(Path(d) / "human.h5")
+Labels are being treated as class names for multiclass classification.
+GPU available: True (cuda), used: True
+TPU available: False, using: 0 TPU cores
+HPU available: False, using: 0 HPUs
+LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
+Testing DataLoader 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 363/363 [00:52<00:00,  6.89it/s]
+GPU available: True (cuda), used: True
+TPU available: False, using: 0 TPU cores
+HPU available: False, using: 0 HPUs
+LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
+Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [01:53<00:00,  1.61it/s]
+[34m[1mwandb[0m: [33mWARNING[0m No relevant files were detected in the specified directory. No code will be logged to your run.