File size: 2,320 Bytes
de5ed0f 2e92d6f 3e53405 de5ed0f 9d25926 de5ed0f cef8a30 de5ed0f cef8a30 de5ed0f cef8a30 de5ed0f cef8a30 de5ed0f 1ca264f de5ed0f cef8a30 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | ---
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-classification
tags:
- biology
- genomics
datasets:
- Genentech/human-atac-catlas-data
base_model:
- Genentech/enformer-model
---
# human-atac-catlas-model
## Model Description
This model is a multi-task classifier trained to predict the binary accessibility of genomic DNA sequences in 204 cell types. It was trained by fine-tuning the Enformer model using the `grelu` library on the human ATAC CATlas dataset.
- **Architecture:** Fine-tuned Enformer (EnformerPretrainedModel)
- **Input:** Genomic sequences (hg38)
- **Output:** Probability of accessibility in 204 cell types
- **Parameters:** 72M total (all trainable)
## Performance
Metrics are computed per cell type and averaged across all 204 cell types.
### Test Set
| Metric | Mean | Std | Min | Max |
|--------|------|-----|-----|-----|
| Accuracy | 0.9416 | 0.0175 | 0.8959 | 0.9743 |
| AUROC | 0.9053 | 0.0167 | 0.8634 | 0.9467 |
| Average Precision | 0.6097 | 0.0374 | 0.4545 | 0.7008 |
| Best F1 | 0.5716 | 0.0289 | 0.4704 | 0.6395 |
### Validation Set
| Metric | Mean | Std | Min | Max |
|--------|------|-----|-----|-----|
| Accuracy | 0.9482 | 0.0172 | 0.9071 | 0.9789 |
| AUROC | 0.8935 | 0.0190 | 0.8350 | 0.9379 |
| Average Precision | 0.5524 | 0.0370 | 0.4168 | 0.6888 |
| Best F1 | 0.5253 | 0.0299 | 0.4285 | 0.6309 |
## Training Details
| Parameter | Value |
|-----------|-------|
| Task | Binary classification |
| Loss | Binary Cross-Entropy |
| Optimizer | Adam |
| Learning rate | 0.0001 |
| Batch size | 3072 |
| Max epochs | 10 |
| n_transformers | 1 |
| crop_len | 0 |
| grelu version | 1.0.4.post1.dev39 |
## Repository Content
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
2. `2_train.ipynb`: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.
3. `output.log`: Training logs.
## How to use
To load this model for inference or fine-tuning, use the `grelu` interface:
```python
from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="Genentech/human-atac-catlas-model",
filename="model.ckpt"
)
model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()
```
|