jimnoneill
/

CarD-T

Token Classification

Eval Results (legacy)

Model card Files Files and versions

Metrics Training metrics Community

jimnoneill commited on May 31, 2025

Commit

1d44aa7

·

verified ·

1 Parent(s): 0adb5e2

Update README.md

Files changed (1) hide show

README.md +2 -15

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ model-index:
 CarD-T (Carcinogen Detection via Transformers) is a novel text analytics approach that combines transformer-based machine learning with probabilistic statistical analysis to efficiently nominate carcinogens from scientific texts. This model is designed to address the challenges faced by current systems in managing the burgeoning biomedical literature related to carcinogen identification and classification.
 ## Model Details
-* **Architecture**: Based on Bio-ELECTRA, a 335 million parameter language model
 * **Training Data**: [CarD-T-NER dataset](https://huggingface.co/datasets/jimnoneill/CarD-T-NER) containing 19,975 annotated examples from PubMed abstracts (2000-2024)
   * Training set: 11,985 examples
   * Test set: 7,990 examples
@@ -255,7 +255,7 @@ training_args = TrainingArguments(
     learning_rate=2e-5,
     per_device_train_batch_size=16,
     per_device_eval_batch_size=16,
-    num_train_epochs=3,
     weight_decay=0.01,
     evaluation_strategy="epoch",
     save_strategy="epoch",
@@ -265,19 +265,6 @@ training_args = TrainingArguments(
 )
 ```
-## Evaluation Metrics
-Detailed performance metrics on the test set (7,990 examples):
-| Entity Type | Precision | Recall | F1-Score | Support |
-|-------------|-----------|---------|----------|---------|
-| carcinogen | 0.912 | 0.878 | 0.895 | 2,341 |
-| negative | 0.867 | 0.823 | 0.844 | 987 |
-| cancertype | 0.889 | 0.856 | 0.872 | 3,124 |
-| antineoplastic | 0.908 | 0.871 | 0.889 | 1,456 |
-| **Overall** | **0.894** | **0.857** | **0.875** | **7,908** |
-## Citation
 If you use this model in your research, please cite:

 CarD-T (Carcinogen Detection via Transformers) is a novel text analytics approach that combines transformer-based machine learning with probabilistic statistical analysis to efficiently nominate carcinogens from scientific texts. This model is designed to address the challenges faced by current systems in managing the burgeoning biomedical literature related to carcinogen identification and classification.
 ## Model Details
+* **Architecture**: Based on Bio-ELECTRA, a 335 million parameter language model (sultan/BioM-ELECTRA-Large-SQuAD2)
 * **Training Data**: [CarD-T-NER dataset](https://huggingface.co/datasets/jimnoneill/CarD-T-NER) containing 19,975 annotated examples from PubMed abstracts (2000-2024)
   * Training set: 11,985 examples
   * Test set: 7,990 examples
     learning_rate=2e-5,
     per_device_train_batch_size=16,
     per_device_eval_batch_size=16,
+    num_train_epochs=5,
     weight_decay=0.01,
     evaluation_strategy="epoch",
     save_strategy="epoch",
 )
 ```
 If you use this model in your research, please cite: