Configuration Parsing Warning:Invalid JSON for config file config.json

biomedical_ner_roberta_base

Overview

biomedical_ner_roberta_base is a token classification model specifically fine-tuned for Named Entity Recognition (NER) in the biomedical domain. It is designed to extract entities from scientific abstracts, clinical notes, and medical literature.

The model identifies three primary entity types using the BIO labeling scheme:

DISEASE: Pathological conditions, signs, and symptoms.
CHEMICAL: Drugs, medications, and chemical compounds.
GENE: Genes, proteins, and related molecular structures.

Model Architecture

This model is based on the roberta-base architecture, fine-tuned using RobertaForTokenClassification. It was trained on a composite dataset including BC5CDR (BioCreative V CDR task corpus) and the NCBI Disease corpus.

Base Model: RoBERTa Base (12 layers, 768 hidden dimension, 12 heads, 125M parameters).
Task: Token Classification (7 labels: O, B-DISEASE, I-DISEASE, B-CHEMICAL, I-CHEMICAL, B-GENE, I-GENE).

Intended Use

This model is intended for researchers and developers working with biomedical text data.

Information Extraction: Automated parsing of PubMed abstracts to identify key biomedical concepts.
Knowledge Graph Construction: Linking genes, drugs, and diseases discovered in text to structured knowledge bases.
Clinical Text Mining: Assisting in extracting relevant information from unstructured electronic health records (EHRs).

How to use

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

model_name = "your_username/biomedical_ner_roberta_base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "The patient was treated with metformin for type 2 diabetes, but showed resistance related to the SLC22A1 gene variant."
results = nlp(text)

for entity in results:
    print(f"Entity: {entity['word']}, Label: {entity['entity_group']}, Score: {entity['score']:.4f}")

# Expected Output structure:
# Entity: metformin, Label: CHEMICAL, Score: 0.99...
# Entity: type 2 diabetes, Label: DISEASE, Score: 0.98...
# Entity: SLC22A1, Label: GENE, Score: 0.97...

Downloads last month: 1

Shoriful025
/

biomedical_ner_roberta_base

biomedical_ner_roberta_base

Overview

Model Architecture

Intended Use

How to use

Dataset used to train Shoriful025/biomedical_ner_roberta_base

Space using Shoriful025/biomedical_ner_roberta_base 1