Lal

Add performance metrics, training details, fix loading code

9f2aaba 4 days ago

1.73 kB

	---
	license: mit
	library_name: pytorch-lightning
	pipeline_tag: tabular-regression
	tags:
	- biology
	- genomics
	datasets:
	- Genentech/GM12878_dnase-data
	---

	# GM12878_dnase-model

	## Model Description
	This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z).

	- Architecture: DilatedConvModel (gReLU)
	- Input: 2,114 bp genomic sequences (hg38)
	- Output: Total DNase-seq coverage in the central 1000 bp
	- Parameters: 6.3M

	## Performance

	\| Split \| MSE \| Pearson \|
	\|-------\|-----\|---------\|
	\| Validation \| 0.4458 \| 0.7524 \|
	\| Test \| 0.4113 \| 0.8056 \|

	## Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Task \| Regression \|
	\| Loss \| MSE \|
	\| Optimizer \| Adam \|
	\| Learning rate \| 0.0001 \|
	\| Batch size \| 512 \|
	\| Max epochs \| 15 \|
	\| Channels \| 512 \|
	\| n_conv \| 9 \|
	\| crop_len \| 557 \|
	\| grelu version \| 1.0.4.post1.dev39 \|

	## Repository Content
	1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
	2. `2_train_GM12878_DNase.ipynb`: Jupyter notebook for training the model.
	3. `3_evaluate_model.ipynb`: Jupyter notebook for evaluating the trained model.
	4. `output.log`: Training logs.

	## How to use
	To load this model for inference or fine-tuning, use the `grelu` interface:

	```python
	from grelu.lightning import LightningModel
	from huggingface_hub import hf_hub_download

	ckpt_path = hf_hub_download(
	repo_id="Genentech/GM12878_dnase-model",
	filename="model.ckpt"
	)

	model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
	model.eval()
	```