aletlvl commited on
Commit
e264b31
·
verified ·
1 Parent(s): a24db0c

Upload Nicheformer model

Browse files
Files changed (1) hide show
  1. README.md +75 -17
README.md CHANGED
@@ -29,33 +29,41 @@ Nicheformer is built on a transformer architecture with the following key featur
29
 
30
  ```python
31
  from transformers import AutoModelForMaskedLM, AutoTokenizer
 
32
 
33
  # Load model and tokenizer
34
- model = AutoModelForMaskedLM.from_pretrained("your-username/nicheformer")
35
- tokenizer = AutoTokenizer.from_pretrained("your-username/nicheformer")
36
 
37
- # Example 1: Manual masking
38
- masked_text = "The [MASK] cell is an important immune cell type."
39
- inputs = tokenizer(masked_text, return_tensors="pt")
40
- outputs = model(**inputs)
 
41
 
42
- # Example 2: Automatic masking (typically used during training)
43
- text = "The T cell is an important immune cell type."
44
- inputs = tokenizer(text, return_tensors="pt")
45
- outputs = model(**inputs, apply_masking=True) # This will automatically mask tokens
46
  ```
47
 
48
  ## Training Data
49
 
50
- [Describe the training data used for the model]
51
-
52
- ## Evaluation Results
53
 
54
- [Include evaluation metrics and results]
 
 
 
 
 
 
 
 
55
 
56
  ## Limitations
57
 
58
- [Describe any known limitations or biases of the model]
 
 
59
 
60
  ## Citation
61
 
@@ -67,8 +75,58 @@ If you use this model in your research, please cite:
67
 
68
  ## License
69
 
70
- This model is released under [specify license]
71
 
72
  ## Contact
73
 
74
- [Add contact information for questions and issues]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ```python
31
  from transformers import AutoModelForMaskedLM, AutoTokenizer
32
+ import anndata as ad
33
 
34
  # Load model and tokenizer
35
+ model = AutoModelForMaskedLM.from_pretrained("aletlvl/Nicheformer")
36
+ tokenizer = AutoTokenizer.from_pretrained("aletlvl/Nicheformer")
37
 
38
+ # Load your single-cell data
39
+ adata = ad.read_h5ad("your_data.h5ad")
40
+
41
+ # Tokenize the data
42
+ inputs = tokenizer(adata)
43
 
44
+ # Get predictions
45
+ outputs = model(**inputs)
 
 
46
  ```
47
 
48
  ## Training Data
49
 
50
+ The model was trained on single-cell gene expression data from various tissues and organisms. It supports:
 
 
51
 
52
+ - **Modalities**: spatial and dissociated
53
+ - **Species**: human and mouse
54
+ - **Technologies**:
55
+ - MERFISH
56
+ - CosMx
57
+ - Xenium
58
+ - 10x Genomics (various versions)
59
+ - CITE-seq
60
+ - Smart-seq v4
61
 
62
  ## Limitations
63
 
64
+ - The model is specifically designed for gene expression data and may not generalize to other types of biological data
65
+ - Performance may vary depending on the quality and type of input data
66
+ - The model works best with data from supported species and technologies
67
 
68
  ## Citation
69
 
 
75
 
76
  ## License
77
 
78
+ This model is released under the MIT License. See the LICENSE file for more details.
79
 
80
  ## Contact
81
 
82
+ For questions and issues, please open an issue on the GitHub repository or contact the maintainers.
83
+
84
+ # nicheformer
85
+
86
+ This is the official repository for **Nicheformer: a foundation model for single-cell and spatial omics**
87
+
88
+ [![Preprint](https://img.shields.io/badge/preprint-available-brightgreen)](https://www.biorxiv.org/content/10.1101/2024.04.15.589472v1)  
89
+
90
+ A rendered Jupyter book version of this repository will be available soon.
91
+
92
+ ## Citation
93
+
94
+ If you use our tool or build upon our concepts in your own work, please cite it as
95
+
96
+ ```
97
+ Schaar, A.C., Tejada-Lapuerta, A., et al. Nicheformer: a foundation model for single-cell and spatial omics. bioRxiv (2024). doi: https://doi.org/10.1101/2024.04.15.589472
98
+ ```
99
+
100
+ ## Installation
101
+
102
+ You need to have Python 3.9 or newer installed on your system. If you don't have
103
+ Python installed, we recommend installing [Mambaforge](https://github.com/conda-forge/miniforge#mambaforge).
104
+
105
+
106
+ <!--
107
+ 1) Install the latest release of `nicheformer` from `PyPI <https://pypi.org/project/nicheformer/>`_:
108
+
109
+ ```bash
110
+ pip install nicheformer
111
+ ```
112
+ -->
113
+
114
+ Install the latest development version:
115
+
116
+ ```bash
117
+ git clone https://github.com/theislab/nicheformer.git
118
+ cd nicheformer
119
+ pip install -e .
120
+ ```
121
+ ## Nicheformer data
122
+ We provide examplary data loading scripts in the data subdirectory that can be used as templates for loading the spatial omics datasets and datasets retreived from GEO.
123
+
124
+ ## Pretraining weights
125
+ We provide the Nicheformer pretraining weights on Mendeley data, they can be downloaded from [here](https://data.mendeley.com/preview/87gm9hrgm8?a=d95a6dde-e054-4245-a7eb-0522d6ea7dff).
126
+
127
+ ## Contact
128
+
129
+ For questions and help requests, you can reach out (preferably) on GitHub or email to the corresponding author.
130
+
131
+
132
+ [issue-tracker]: https://github.com/theislab/nicheformer/issues