Genecorpus 30 M annotation
Hello,
sorry to disturb yet another time, but i was wondering if it were possible to retrieve a possible annotation for the training set used for training the model, i.e. the paper each "single cell" comes from, its cell type and tissue and origin of the sample (if it is in vivo or ex vivo, primary or cell line).
Is there a way to obtain this information?
Thanks again for the help!
Thank you for reaching out. Please see the supplementary table from our manuscript (Theodoris et al, Nature 2023) for the list of studies the cells were sourced from. Unfortunately we do not have the metadata you reference annotated at the individual cell level in the pretraining corpus, and many primary studies do not provide cell type annotations. One of the benefits of the self-supervised learning approach is that it does not require labeled data and therefore allows one to train on much larger available data without restricting to those with annotations such as cell type, etc.