ncbi/ncbi_disease
Updated โข 3.96k โข 52
Configuration Parsing Warning:Invalid JSON for config file config.json
biomedical_ner_roberta_base is a token classification model specifically fine-tuned for Named Entity Recognition (NER) in the biomedical domain. It is designed to extract entities from scientific abstracts, clinical notes, and medical literature.
The model identifies three primary entity types using the BIO labeling scheme:
This model is based on the roberta-base architecture, fine-tuned using RobertaForTokenClassification. It was trained on a composite dataset including BC5CDR (BioCreative V CDR task corpus) and the NCBI Disease corpus.
This model is intended for researchers and developers working with biomedical text data.
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
model_name = "your_username/biomedical_ner_roberta_base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
text = "The patient was treated with metformin for type 2 diabetes, but showed resistance related to the SLC22A1 gene variant."
results = nlp(text)
for entity in results:
print(f"Entity: {entity['word']}, Label: {entity['entity_group']}, Score: {entity['score']:.4f}")
# Expected Output structure:
# Entity: metformin, Label: CHEMICAL, Score: 0.99...
# Entity: type 2 diabetes, Label: DISEASE, Score: 0.98...
# Entity: SLC22A1, Label: GENE, Score: 0.97...