File size: 2,320 Bytes
de5ed0f
 
 
 
 
 
 
 
2e92d6f
3e53405
 
de5ed0f
 
9d25926
de5ed0f
 
cef8a30
de5ed0f
cef8a30
de5ed0f
cef8a30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de5ed0f
 
 
 
 
 
 
 
 
 
 
 
 
 
cef8a30
de5ed0f
 
 
1ca264f
de5ed0f
cef8a30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-classification
tags:
- biology
- genomics
datasets:
- Genentech/human-atac-catlas-data
base_model:
- Genentech/enformer-model
---

# human-atac-catlas-model

## Model Description
This model is a multi-task classifier trained to predict the binary accessibility of genomic DNA sequences in 204 cell types. It was trained by fine-tuning the Enformer model using the `grelu` library on the human ATAC CATlas dataset.

- **Architecture:** Fine-tuned Enformer (EnformerPretrainedModel)
- **Input:** Genomic sequences (hg38)
- **Output:** Probability of accessibility in 204 cell types
- **Parameters:** 72M total (all trainable)

## Performance

Metrics are computed per cell type and averaged across all 204 cell types.

### Test Set
| Metric | Mean | Std | Min | Max |
|--------|------|-----|-----|-----|
| Accuracy | 0.9416 | 0.0175 | 0.8959 | 0.9743 |
| AUROC | 0.9053 | 0.0167 | 0.8634 | 0.9467 |
| Average Precision | 0.6097 | 0.0374 | 0.4545 | 0.7008 |
| Best F1 | 0.5716 | 0.0289 | 0.4704 | 0.6395 |

### Validation Set
| Metric | Mean | Std | Min | Max |
|--------|------|-----|-----|-----|
| Accuracy | 0.9482 | 0.0172 | 0.9071 | 0.9789 |
| AUROC | 0.8935 | 0.0190 | 0.8350 | 0.9379 |
| Average Precision | 0.5524 | 0.0370 | 0.4168 | 0.6888 |
| Best F1 | 0.5253 | 0.0299 | 0.4285 | 0.6309 |

## Training Details

| Parameter | Value |
|-----------|-------|
| Task | Binary classification |
| Loss | Binary Cross-Entropy |
| Optimizer | Adam |
| Learning rate | 0.0001 |
| Batch size | 3072 |
| Max epochs | 10 |
| n_transformers | 1 |
| crop_len | 0 |
| grelu version | 1.0.4.post1.dev39 |

## Repository Content
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
2. `2_train.ipynb`: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.
3. `output.log`: Training logs.

## How to use
To load this model for inference or fine-tuning, use the `grelu` interface:

```python
from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="Genentech/human-atac-catlas-model",
    filename="model.ckpt"
)

model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()
```