Embedding bulk RNA-seq
Hello, I have read the papers and i found them super useful and inspiring, but now I have a theoretical question.
Single cell data is not always available in public repositories, for example in human protein atlas there is bulk RNAseq (i think) for cell lines, and one might be interested in embedding the cell lines to see where they fall in the embedding space. They are with 1 TPM value for each gene for each cell line. how would you suggest to embed them with the model? Or do you think it is a completely non-sense thing to do?
Thanks for the help!
Thank you for your question. We have not tested using bulk RNAseq with Geneformer but one could certainly do so. The main aspect to consider is that the input size is 4096 expressed genes per example so will not fully encompass bulk RNAseq as it was designed for single cell data where less genes are detected. You could do some feature selection or just proceed with the data as is - it may be useful to fine-tune with the bulk data to account for the data type differences.