| # Gene Extraction Model | |
| This model is fine-tuned for gene extraction using BERT-CRF architecture. | |
| ## Model Description | |
| This model uses a custom BERT-CRF architecture for token classification, specifically designed for gene entity recognition. The model combines BERT with a Conditional Random Field (CRF) layer for improved sequence labeling. | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForTokenClassification | |
| from transformers import pipeline | |
| model_name = "RaduGabriel/gene-entity-recognition" | |
| hf_token = None | |
| tokenizer = AutoTokenizer.from_pretrained(model_name, token=hf_token) | |
| model = AutoModelForTokenClassification.from_pretrained(model_name, token=hf_token) | |
| text = "TIF1gamma, a novel member of the transcriptional intermediary factor 1 family, plays a crucial role in gene regulation." | |
| # Create NER pipeline | |
| ner_pipeline = pipeline( | |
| "ner", | |
| model=model, | |
| tokenizer=tokenizer, | |
| aggregation_strategy="simple" | |
| ) | |
| results = ner_pipeline(text) | |
| print(results) | |
| ``` | |
| ## Labels | |
| - O | |
| - B-GENE | |
| - I-GENE | |
| ## Model Details | |
| - Architecture: BERT-CRF | |
| - Base Model: dmis-lab/biobert-v1.1 | |
| - Number of Labels: 3 | |
| - CRF Layer: Enabled | |
| ## Training Details | |
| - Training Data: GNormPlus dataset | |
| - Optimizer: AdamW | |
| - Learning Rate: 2e-05 | |
| - Batch Size: 32 | |
| - Epochs: 3 | |