README updated
Browse files
README.md
CHANGED
|
@@ -1,12 +1,3 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
base_model:
|
| 4 |
-
- aletlvl/Nicheformer
|
| 5 |
-
tags:
|
| 6 |
-
- single-cell
|
| 7 |
-
- transcriptomics
|
| 8 |
-
- biology
|
| 9 |
-
---
|
| 10 |
# Nicheformer
|
| 11 |
|
| 12 |
Nicheformer is a transformer-based model designed for understanding and predicting cellular niches and their interactions. The model uses masked language modeling to learn representations of cellular contexts and their relationships.
|
|
@@ -41,8 +32,13 @@ from transformers import AutoModelForMaskedLM, AutoTokenizer
|
|
| 41 |
import anndata as ad
|
| 42 |
|
| 43 |
# Load model and tokenizer
|
| 44 |
-
model = AutoModelForMaskedLM.from_pretrained("aletlvl/Nicheformer")
|
| 45 |
-
tokenizer = AutoTokenizer.from_pretrained("aletlvl/Nicheformer")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
# Load your single-cell data
|
| 48 |
adata = ad.read_h5ad("your_data.h5ad")
|
|
@@ -50,8 +46,13 @@ adata = ad.read_h5ad("your_data.h5ad")
|
|
| 50 |
# Tokenize the data
|
| 51 |
inputs = tokenizer(adata)
|
| 52 |
|
| 53 |
-
# Get
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
```
|
| 56 |
|
| 57 |
## Training Data
|
|
@@ -74,7 +75,6 @@ The model was trained on single-cell gene expression data from various tissues a
|
|
| 74 |
- Performance may vary depending on the quality and type of input data
|
| 75 |
- The model works best with data from supported species and technologies
|
| 76 |
|
| 77 |
-
|
| 78 |
## License
|
| 79 |
|
| 80 |
This model is released under the MIT License. See the LICENSE file for more details.
|
|
@@ -89,7 +89,6 @@ This is the official repository for **Nicheformer: a foundation model for single
|
|
| 89 |
|
| 90 |
[](https://www.biorxiv.org/content/10.1101/2024.04.15.589472v1)
|
| 91 |
|
| 92 |
-
|
| 93 |
## Citation
|
| 94 |
|
| 95 |
If you use our tool or build upon our concepts in your own work, please cite it as
|
|
@@ -130,4 +129,4 @@ We provide the Nicheformer pretraining weights on Mendeley data, they can be dow
|
|
| 130 |
For questions and help requests, you can reach out (preferably) on GitHub or email to the corresponding author.
|
| 131 |
|
| 132 |
|
| 133 |
-
[issue-tracker]: https://github.com/theislab/nicheformer/issues
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Nicheformer
|
| 2 |
|
| 3 |
Nicheformer is a transformer-based model designed for understanding and predicting cellular niches and their interactions. The model uses masked language modeling to learn representations of cellular contexts and their relationships.
|
|
|
|
| 32 |
import anndata as ad
|
| 33 |
|
| 34 |
# Load model and tokenizer
|
| 35 |
+
model = AutoModelForMaskedLM.from_pretrained("aletlvl/Nicheformer", trust_remote_code=True)
|
| 36 |
+
tokenizer = AutoTokenizer.from_pretrained("aletlvl/Nicheformer", trust_remote_code=True)
|
| 37 |
+
|
| 38 |
+
# Set technology mean for HF tokenizer
|
| 39 |
+
technology_mean_path = 'technology_mean.npy'
|
| 40 |
+
technology_mean = np.load(technology_mean_path)
|
| 41 |
+
tokenizer._load_technology_mean(technology_mean)
|
| 42 |
|
| 43 |
# Load your single-cell data
|
| 44 |
adata = ad.read_h5ad("your_data.h5ad")
|
|
|
|
| 46 |
# Tokenize the data
|
| 47 |
inputs = tokenizer(adata)
|
| 48 |
|
| 49 |
+
# Get embeddings
|
| 50 |
+
embeddings = model.get_embeddings(
|
| 51 |
+
input_ids=inputs["input_ids"],
|
| 52 |
+
attention_mask=inputs["attention_mask"],
|
| 53 |
+
layer=-1,
|
| 54 |
+
with_context=False
|
| 55 |
+
)
|
| 56 |
```
|
| 57 |
|
| 58 |
## Training Data
|
|
|
|
| 75 |
- Performance may vary depending on the quality and type of input data
|
| 76 |
- The model works best with data from supported species and technologies
|
| 77 |
|
|
|
|
| 78 |
## License
|
| 79 |
|
| 80 |
This model is released under the MIT License. See the LICENSE file for more details.
|
|
|
|
| 89 |
|
| 90 |
[](https://www.biorxiv.org/content/10.1101/2024.04.15.589472v1)
|
| 91 |
|
|
|
|
| 92 |
## Citation
|
| 93 |
|
| 94 |
If you use our tool or build upon our concepts in your own work, please cite it as
|
|
|
|
| 129 |
For questions and help requests, you can reach out (preferably) on GitHub or email to the corresponding author.
|
| 130 |
|
| 131 |
|
| 132 |
+
[issue-tracker]: https://github.com/theislab/nicheformer/issues
|