Instructions to use r76941156/rare-disease-embedding-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use r76941156/rare-disease-embedding-model with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("r76941156/rare-disease-embedding-model") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
License
This model is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. It may be used, shared, and adapted for non-commercial research purposes only.
Qwen3-Based Rare Disease Biomedical Embedding Model
This repository provides a fine-tuned embedding model designed for biomedical and rare disease text representation. The model is based on Qwen3-Embedding-8B and was fine-tuned using 1,320 Claude Sonnet 4–generated disease summaries derived from the National Organization for Rare Disorders (NORD) disease catalog. The source code for model training and downstream ranking is available in the GEN-KnowRD GitHub repository.
We provide the Claude Sonnet 4 knowledge fine-tuned version in this repository. Researchers are welcome to further fine-tune or reproduce the training workflow using disease summaries generated by other LLMs or alternative rare disease knowledge sources.
Intended use
This model is intended for research use in rare disease phenotype representation, disease-profile retrieval, and embedding-based candidate ranking.
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("r76941156/rare-disease-embedding-model")
texts = [
"progressive dyspnea and pulmonary fibrosis",
"idiopathic pulmonary fibrosis with cough and exertional shortness of breath",
"seizures, developmental delay, and skin lesions suggestive of tuberous sclerosis",
"recurrent infections and low immunoglobulin levels"
]
embeddings = model.encode(texts, normalize_embeddings=True)
similarities = model.similarity(embeddings, embeddings)
print("Embedding shape:", embeddings.shape)
print("Similarity matrix shape:", similarities.shape)
print(similarities)
Citation
Yan C, Su WC, Xin Y, Grabowska ME, Kerchberger VE, Borza VA, Wang J, Wang L, Li R, Lynn J, Dickson AL. Reframing AI for Rare Disease Recognition. Research Square. 2026 Apr 2:rs-3.
- Downloads last month
- 23