Lal Claude Opus 4.6 commited on
Commit
cef8a30
·
1 Parent(s): 9d25926

Add performance metrics and training details

Browse files

- Fix dataset description (CATlas ATAC, not ChromHMM)
- Add test and validation metrics (accuracy, AUROC, avg precision, F1)
- Add training hyperparameters (lr, batch size, epochs, loss, optimizer)
- Add parameter count (72M)
- Add model architecture details (EnformerPretrainedModel)
- Add grelu version used for training

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +40 -5
README.md CHANGED
@@ -14,11 +14,46 @@ base_model:
14
  # human-atac-catlas-model
15
 
16
  ## Model Description
17
- This model is a multi-task classifier trained to predict the binary accessibility of genomic DNA sequences in 204 cell types. It was trained by fine-tuning the Enformer model using the `grelu` library on top of the human ChromHMM fullstack annotation dataset.
18
 
19
- - **Architecture:** Fine-tuned Enformer
20
  - **Input:** Genomic sequences (hg38)
21
- - **Output:** Probability of accessibility in 204 cell types.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ## Repository Content
24
  1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
@@ -33,10 +68,10 @@ from grelu.lightning import LightningModel
33
  from huggingface_hub import hf_hub_download
34
 
35
  ckpt_path = hf_hub_download(
36
- repo_id="Genentech/human-atac-catlas-model",
37
  filename="model.ckpt"
38
  )
39
 
40
  model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
41
  model.eval()
42
- ```
 
14
  # human-atac-catlas-model
15
 
16
  ## Model Description
17
+ This model is a multi-task classifier trained to predict the binary accessibility of genomic DNA sequences in 204 cell types. It was trained by fine-tuning the Enformer model using the `grelu` library on the human ATAC CATlas dataset.
18
 
19
+ - **Architecture:** Fine-tuned Enformer (EnformerPretrainedModel)
20
  - **Input:** Genomic sequences (hg38)
21
+ - **Output:** Probability of accessibility in 204 cell types
22
+ - **Parameters:** 72M total (all trainable)
23
+
24
+ ## Performance
25
+
26
+ Metrics are computed per cell type and averaged across all 204 cell types.
27
+
28
+ ### Test Set
29
+ | Metric | Mean | Std | Min | Max |
30
+ |--------|------|-----|-----|-----|
31
+ | Accuracy | 0.9416 | 0.0175 | 0.8959 | 0.9743 |
32
+ | AUROC | 0.9053 | 0.0167 | 0.8634 | 0.9467 |
33
+ | Average Precision | 0.6097 | 0.0374 | 0.4545 | 0.7008 |
34
+ | Best F1 | 0.5716 | 0.0289 | 0.4704 | 0.6395 |
35
+
36
+ ### Validation Set
37
+ | Metric | Mean | Std | Min | Max |
38
+ |--------|------|-----|-----|-----|
39
+ | Accuracy | 0.9482 | 0.0172 | 0.9071 | 0.9789 |
40
+ | AUROC | 0.8935 | 0.0190 | 0.8350 | 0.9379 |
41
+ | Average Precision | 0.5524 | 0.0370 | 0.4168 | 0.6888 |
42
+ | Best F1 | 0.5253 | 0.0299 | 0.4285 | 0.6309 |
43
+
44
+ ## Training Details
45
+
46
+ | Parameter | Value |
47
+ |-----------|-------|
48
+ | Task | Binary classification |
49
+ | Loss | Binary Cross-Entropy |
50
+ | Optimizer | Adam |
51
+ | Learning rate | 0.0001 |
52
+ | Batch size | 3072 |
53
+ | Max epochs | 10 |
54
+ | n_transformers | 1 |
55
+ | crop_len | 0 |
56
+ | grelu version | 1.0.4.post1.dev39 |
57
 
58
  ## Repository Content
59
  1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
 
68
  from huggingface_hub import hf_hub_download
69
 
70
  ckpt_path = hf_hub_download(
71
+ repo_id="Genentech/human-atac-catlas-model",
72
  filename="model.ckpt"
73
  )
74
 
75
  model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
76
  model.eval()
77
+ ```