Shoriful025
/

biomedical_ner_roberta_base

Token Classification

Model card Files Files and versions

Shoriful025 commited on Dec 22, 2025

Commit

a2331f8

·

verified ·

1 Parent(s): 28cfb2f

Create README.md

Files changed (1) hide show

README.md +62 -0

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+language:
+- en
+tags:
+- ner
+- biomedical
+- token-classification
+- roberta
+license: apache-2.0
+datasets:
+- bc5cdr
+- ncbi_disease
+---
+# biomedical_ner_roberta_base
+## Overview
+`biomedical_ner_roberta_base` is a token classification model specifically fine-tuned for Named Entity Recognition (NER) in the biomedical domain. It is designed to extract entities from scientific abstracts, clinical notes, and medical literature.
+The model identifies three primary entity types using the BIO labeling scheme:
+* **DISEASE**: Pathological conditions, signs, and symptoms.
+* **CHEMICAL**: Drugs, medications, and chemical compounds.
+* **GENE**: Genes, proteins, and related molecular structures.
+## Model Architecture
+This model is based on the `roberta-base` architecture, fine-tuned using `RobertaForTokenClassification`. It was trained on a composite dataset including BC5CDR (BioCreative V CDR task corpus) and the NCBI Disease corpus.
+- **Base Model:** RoBERTa Base (12 layers, 768 hidden dimension, 12 heads, 125M parameters).
+- **Task:** Token Classification (7 labels: O, B-DISEASE, I-DISEASE, B-CHEMICAL, I-CHEMICAL, B-GENE, I-GENE).
+## Intended Use
+This model is intended for researchers and developers working with biomedical text data.
+- **Information Extraction:** Automated parsing of PubMed abstracts to identify key biomedical concepts.
+- **Knowledge Graph Construction:** Linking genes, drugs, and diseases discovered in text to structured knowledge bases.
+- **Clinical Text Mining:** Assisting in extracting relevant information from unstructured electronic health records (EHRs).
+### How to use
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+from transformers import pipeline
+model_name = "your_username/biomedical_ner_roberta_base"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForTokenClassification.from_pretrained(model_name)
+nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
+text = "The patient was treated with metformin for type 2 diabetes, but showed resistance related to the SLC22A1 gene variant."
+results = nlp(text)
+for entity in results:
+    print(f"Entity: {entity['word']}, Label: {entity['entity_group']}, Score: {entity['score']:.4f}")
+# Expected Output structure:
+# Entity: metformin, Label: CHEMICAL, Score: 0.99...
+# Entity: type 2 diabetes, Label: DISEASE, Score: 0.98...
+# Entity: SLC22A1, Label: GENE, Score: 0.97...