About in silicon perturbation error

#63
by Renqing - opened

Hello, When i run the in silicon perturbation code encountered the following errors:
The code is:
from geneformer import InSilicoPerturber
from geneformer import InSilicoPerturberStats

isp = InSilicoPerturber(perturb_type="delete",
perturb_rank_shift=None,
genes_to_perturb="all",
combos=0,
anchor_gene=None,
model_type="Pretrained",
num_classes=3,
emb_mode="cell",
cell_emb_style="mean_pool",
filter_data={"cell_type":["Cardiomyocyte1","Cardiomyocyte2","Cardiomyocyte3"]},
cell_states_to_model={"disease":(["dcm"],["nf"],["hcm"])},
max_ncells=2000,
emb_layer=0,
forward_batch_size=4,
nproc=16,
save_raw_data=True)
isp.perturb_data("/path_to_pretrained/Geneformer/", "/path_to/example_input_file/cell_classification/disease_classification/human_dcm_hcm_nf.dataset/",
"/path_to/cell_classification/disease_classification/",
"test")

The error is:
File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/geneformer/in_silico_perturber.py:232, in quant_cos_sims(model, perturbation_batch, forward_batch_size, layer_to_quant, original_emb, indices_to_perturb, cell_states_to_model, state_embs_dict)
230 else:
231 for state in possible_states:
--> 232 cos_sims_vs_alt_dict[state] += cos_sim_shift(original_emb, minibatch_emb, state_embs_dict[state])
233 del outputs
234 del minibatch_emb

File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/geneformer/in_silico_perturber.py:251, in cos_sim_shift(original_emb, minibatch_emb, alt_emb)
249 original_emb = torch.mean(original_emb,dim=0,keepdim=True)[None, :]
250 alt_emb = alt_emb[None, None, :]
--> 251 origin_v_end = cos(original_emb,alt_emb)
252 perturb_v_end = cos(torch.mean(minibatch_emb,dim=1,keepdim=True),alt_emb)
253 return [(perturb_v_end-origin_v_end).to("cpu")]

File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
1098 # If we don't have any hooks, we want to skip the rest of the logic in
1099 # this function, and just call forward.
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/torch/nn/modules/distance.py:77, in CosineSimilarity.forward(self, x1, x2)
76 def forward(self, x1: Tensor, x2: Tensor) -> Tensor:
---> 77 return F.cosine_similarity(x1, x2, self.dim, self.eps)

RuntimeError: cosine_similarity requires both inputs to have the same number of dimensions, but x1 has 3 and x2 has 5

I pull the repository on Jun 19.

Thank you for your interest in Geneformer and for your question. I am not encountering this error when I run it with your options. Could you please pull the current version and retry?

Also, I wanted to note that when using in silico perturbation to test for perturbations that shift cells between two very similar states, it will likely be more effective if you first fine-tune the model to distinguish between the states so that they are better separated within the embedding space before testing what perturbation shifts between them. Specifically, the end-stage failing heart states are similar between the dilated and hypertrophic cardiomyopathy samples in Chaffin et al. Nature 2022, so it will likely be more effective to first fine-tune the model to distinguish them before running the in silico perturbation/treatment analysis. Fine-tuning the model is less necessary when testing the shift between two states that are already very separable by the pretrained model (e.g. fibroblasts vs. cardiomyocytes). However, fine-tuning with relevant data may still be helpful to orient the model's weights towards the specific downstream objective.

ctheodoris changed discussion status to closed

Hey, I called back the repository, but the error didn't work out.
`
File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/geneformer/in_silico_perturber.py:251, in quant_cos_sims(model, perturb_type, perturbation_batch, forward_batch_size, layer_to_quant, original_emb, indices_to_perturb, cell_states_to_model, state_embs_dict)
249 elif cell_states_to_model is not None:
250 for state in possible_states:
--> 251 cos_sims_vs_alt_dict[state] += cos_sim_shift(original_emb, minibatch_emb, state_embs_dict[state])
252 del outputs
253 del minibatch_emb

File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/geneformer/in_silico_perturber.py:270, in cos_sim_shift(original_emb, minibatch_emb, alt_emb)
268 original_emb = torch.mean(original_emb,dim=0,keepdim=True)[None, :]
269 alt_emb = alt_emb[None, None, :]
--> 270 origin_v_end = cos(original_emb,alt_emb)
271 perturb_v_end = cos(torch.mean(minibatch_emb,dim=1,keepdim=True),alt_emb)
272 return [(perturb_v_end-origin_v_end).to("cpu")]

File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
1098 # If we don't have any hooks, we want to skip the rest of the logic in
1099 # this function, and just call forward.
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/geneformer_py38/lib/python3.8/site-packages/torch/nn/modules/distance.py:77, in CosineSimilarity.forward(self, x1, x2)
76 def forward(self, x1: Tensor, x2: Tensor) -> Tensor:
---> 77 return F.cosine_similarity(x1, x2, self.dim, self.eps)

RuntimeError: cosine_similarity requires both inputs to have the same number of dimensions, but x1 has 3 and x2 has 5
`

Load dataset logging:
Loading cached processed dataset at /Geneformer/example_input_file/cell_classification/disease_classification/human_dcm_hcm_nf.dataset/cache-fb03eeb02b1434f4_*_of_00016.arrow
Loading cached shuffled indices for dataset at /Geneformer/example_input_file/cell_classification/disease_classification/human_dcm_hcm_nf.dataset/cache-a0e2d2a3c71a319a.arrow
Loading cached sorted indices for dataset at /Geneformer/example_input_file/cell_classification/disease_classification/human_dcm_hcm_nf.dataset/cache-463255c0abb2b70b.arrow

image.png

image.png

I fix this erro, because of the torch 1.10.0

By the way, In the script in_silico_perturber.py line 490, as follow:

image.png
My understanding is that significantly this model can only do three states of perturbation, right? If I want to distinguish between two states, normal and tumor, the current perturbation model is not possible.

Thank you for your question. The code you highlighted just checks if the values within the three lists are unique. For modeling 2 states, you can just provide an empty list as the third element (i.e. []). I added this to the documentation to be extra clear and updated the stats script to account for that option.

Also, it would be great if you could start a new discussion if there are questions that are unrelated to the initial question in a discussion. It would also be great to be descriptive in the discussion title to indicate what the question is specifically about in each discussion. That would be very helpful for others if they are looking for the answer to the same question. Thank you for your collaboration on that!

Thank you for your responsing.

Sign up or log in to comment