|
|
---
|
|
|
'[object Object]': null
|
|
|
license: apache-2.0
|
|
|
language:
|
|
|
- en
|
|
|
pipeline_tag: token-classification
|
|
|
tags:
|
|
|
- RepresentationLearning
|
|
|
- Genomics
|
|
|
- Variant
|
|
|
- Classiciation
|
|
|
- Mutations
|
|
|
- Embedding
|
|
|
- VariantClassificaion
|
|
|
---
|
|
|
|
|
|
# Model - GvEM (Genomic Variant Embedding Model) |
|
|
|
|
|
**GvEM** is a PyTorch-based deep learning model designed to embed and model genomic mutation data from VCF (Variant Call Format) files using a biologically-informed hierarchy: |
|
|
**Pathway β Chromosome β Gene β Mutations** |
|
|
|
|
|
--- |
|
|
## Hierarchy of input data |
|
|
|
|
|
example_data = { |
|
|
'sample1': { |
|
|
'pathway1': { |
|
|
'chr1': { |
|
|
'gene1': [ |
|
|
{ |
|
|
'impact': 'HIGH', |
|
|
'reference': 'A', |
|
|
'alternate': 'T' |
|
|
} |
|
|
] |
|
|
} |
|
|
} |
|
|
} |
|
|
} |
|
|
|
|
|
--- |
|
|
## Features |
|
|
|
|
|
* **VCF Parser**: Converts standard VCF files into a hierarchical JSON-like structure. |
|
|
* **MutationEmbedder**: Learns embeddings for categorical mutation features (scalable). |
|
|
* **GeneEncoder**: Processes lists of mutations using Transformer and heirarchical attention to get gene-level representations. |
|
|
* **ChromosomeEncoder**: Aggregates gene encodings. |
|
|
* **PathwayEncoder**: Aggregates chromosome encodings to yield final sample representation. |
|
|
* **Scalable**: Easily extensible to new fields or biological groupings. |
|
|
* **HuggingFace Compatible**: Designed for sharing and experimentation on the π€ Hub. |
|
|
--- |
|
|
## Uses |
|
|
|
|
|
# Direct Use : |
|
|
* Obtain sample level embeddings |
|
|
* Mutation pattern learning |
|
|
* Transfer learning across genomic datasets |
|
|
|
|
|
# Downstream Use : |
|
|
* Variant-based disease prediction (e.g., cancer, rare diseases, ASD) |
|
|
* Multi-omics fusion models (tabular + image + VCF) |
|
|
* Cohort level mutation analysis |
|
|
* Fine-tuning for prognosis, drug response prediction, or variant effect interpretation. |
|
|
|
|
|
# Limitations |
|
|
* Use in clinical decision-making without expert oversight. |
|
|
* Input variants must already be annotated. |
|
|
* Application to non-human genomes, unless explicitly fine-tuned for those organisms. |
|
|
* High-resolution functional variant prediction - FUTURE DEVELOPMENT TO BE MADE |
|
|
--- |
|
|
|
|
|
## MODEL STILL UNDER DEVELOPMENT |