Shoriful025 commited on
Commit
a2331f8
·
verified ·
1 Parent(s): 28cfb2f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - ner
6
+ - biomedical
7
+ - token-classification
8
+ - roberta
9
+ license: apache-2.0
10
+ datasets:
11
+ - bc5cdr
12
+ - ncbi_disease
13
+ ---
14
+
15
+ # biomedical_ner_roberta_base
16
+
17
+ ## Overview
18
+
19
+ `biomedical_ner_roberta_base` is a token classification model specifically fine-tuned for Named Entity Recognition (NER) in the biomedical domain. It is designed to extract entities from scientific abstracts, clinical notes, and medical literature.
20
+
21
+ The model identifies three primary entity types using the BIO labeling scheme:
22
+ * **DISEASE**: Pathological conditions, signs, and symptoms.
23
+ * **CHEMICAL**: Drugs, medications, and chemical compounds.
24
+ * **GENE**: Genes, proteins, and related molecular structures.
25
+
26
+ ## Model Architecture
27
+
28
+ This model is based on the `roberta-base` architecture, fine-tuned using `RobertaForTokenClassification`. It was trained on a composite dataset including BC5CDR (BioCreative V CDR task corpus) and the NCBI Disease corpus.
29
+
30
+ - **Base Model:** RoBERTa Base (12 layers, 768 hidden dimension, 12 heads, 125M parameters).
31
+ - **Task:** Token Classification (7 labels: O, B-DISEASE, I-DISEASE, B-CHEMICAL, I-CHEMICAL, B-GENE, I-GENE).
32
+
33
+ ## Intended Use
34
+
35
+ This model is intended for researchers and developers working with biomedical text data.
36
+
37
+ - **Information Extraction:** Automated parsing of PubMed abstracts to identify key biomedical concepts.
38
+ - **Knowledge Graph Construction:** Linking genes, drugs, and diseases discovered in text to structured knowledge bases.
39
+ - **Clinical Text Mining:** Assisting in extracting relevant information from unstructured electronic health records (EHRs).
40
+
41
+ ### How to use
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
45
+ from transformers import pipeline
46
+
47
+ model_name = "your_username/biomedical_ner_roberta_base"
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
49
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
50
+
51
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
52
+
53
+ text = "The patient was treated with metformin for type 2 diabetes, but showed resistance related to the SLC22A1 gene variant."
54
+ results = nlp(text)
55
+
56
+ for entity in results:
57
+ print(f"Entity: {entity['word']}, Label: {entity['entity_group']}, Score: {entity['score']:.4f}")
58
+
59
+ # Expected Output structure:
60
+ # Entity: metformin, Label: CHEMICAL, Score: 0.99...
61
+ # Entity: type 2 diabetes, Label: DISEASE, Score: 0.98...
62
+ # Entity: SLC22A1, Label: GENE, Score: 0.97...