use geneformer for mouse data

#574
by jenny143 - opened

Hello Geneformer team,thank you very much for your great work on Geneformer.

I would like to apply Geneformer to mouse data. Since Geneformer is trained on human genes (ENSG), my understanding is that mouse genes symbol need to be mapped to human ENSG via ortholog relationships.

Currently, I plan to use Ensembl BioMart(https://www.ensembl.org/biomart/martview/ac66a58f8ff326bdc17be2a80d653340) by selecting Mouse genes (GRCm39) and retrieving the corresponding human ortholog ENSG IDs.

Could you please confirm whether this is the recommended approach, or if Geneformer provides any official utilities or best practices for handling mouse-to-human gene mapping?

Thank you very much for your time and guidance.

jenny143 changed discussion title from use geneformer for mouse to use geneformer for mouse data

Thank you for your question - we have not tested Geneformer on mouse data but agree that mapping to mouse orthologs, using Ensembl or another established tool, is a reasonable approach.

ctheodoris changed discussion status to closed

Thank you for your question - we have not tested Geneformer on mouse data but agree that mapping to mouse orthologs, using Ensembl or another established tool, is a reasonable approach.

Thank you for your reply.
And I have a question regarding the file ensembl_mapping_dict_gc30M.pkl.
Could you please clarify which Ensembl human gene annotation version was used to generate the ENSG identifiers in this mapping dictionary?
This information would be very helpful for ensuring consistent cross-species gene mapping and compatibility with the Geneformer tokenizer.
Thank you very much for your help.

Sign up or log in to comment