License

This model is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. It may be used, shared, and adapted for non-commercial research purposes only.

Qwen3-Based Rare Disease Biomedical Embedding Model

This repository provides a fine-tuned embedding model designed for biomedical and rare disease text representation. The model is based on Qwen3-Embedding-8B and was fine-tuned using 1,320 Claude Sonnet 4–generated disease summaries derived from the National Organization for Rare Disorders (NORD) disease catalog. The source code for model training and downstream ranking is available in the GEN-KnowRD GitHub repository.

We provide the Claude Sonnet 4 knowledge fine-tuned version in this repository. Researchers are welcome to further fine-tune or reproduce the training workflow using disease summaries generated by other LLMs or alternative rare disease knowledge sources.

Intended use

This model is intended for research use in rare disease phenotype representation, disease-profile retrieval, and embedding-based candidate ranking.

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("r76941156/rare-disease-embedding-model")

texts = [
    "progressive dyspnea and pulmonary fibrosis",
    "idiopathic pulmonary fibrosis with cough and exertional shortness of breath",
    "seizures, developmental delay, and skin lesions suggestive of tuberous sclerosis",
    "recurrent infections and low immunoglobulin levels"
]

embeddings = model.encode(texts, normalize_embeddings=True)

similarities = model.similarity(embeddings, embeddings)
print("Embedding shape:", embeddings.shape)
print("Similarity matrix shape:", similarities.shape)
print(similarities)

Citation

Yan C, Su WC, Xin Y, Grabowska ME, Kerchberger VE, Borza VA, Wang J, Wang L, Li R, Lynn J, Dickson AL. Reframing AI for Rare Disease Recognition. Research Square. 2026 Apr 2:rs-3.

Downloads last month
23
Safetensors
Model size
308k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for r76941156/rare-disease-embedding-model

Finetuned
(32)
this model