|
|
--- |
|
|
tags: |
|
|
- text-classification |
|
|
- roberta |
|
|
- scientific-abstracts |
|
|
- multi-class |
|
|
- research-field-classification |
|
|
datasets: |
|
|
- ScientificArticleAbstract_Classification |
|
|
license: apache-2.0 |
|
|
model-index: |
|
|
- name: ScientificTextClassification_ResearchField |
|
|
results: |
|
|
- task: |
|
|
name: Text Classification |
|
|
type: text-classification |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.941 |
|
|
name: Accuracy (Top-1) |
|
|
- type: macro_f1 |
|
|
value: 0.935 |
|
|
name: Macro F1 Score |
|
|
--- |
|
|
|
|
|
# ScientificTextClassification_ResearchField |
|
|
|
|
|
## ๐ Overview |
|
|
|
|
|
This is a **RoBERTa-base** model fine-tuned for the complex task of multi-class classification of scientific article abstracts. The model predicts the **primary research field** (e.g., Physics, Biology, Computer Science) based solely on the abstract text, serving as a powerful tool for automated journal indexing and literature review organization. |
|
|
|
|
|
## ๐ง Model Architecture |
|
|
|
|
|
The choice of RoBERTa ensures enhanced robustness and better handling of long-range dependencies common in technical and scientific prose. |
|
|
|
|
|
* **Base Model:** `roberta-base` (an optimized BERT approach without the next-sentence prediction objective). |
|
|
* **Classification Head:** Outputs 8 distinct categories (`num_labels: 8`). |
|
|
* **Input Data:** Detailed scientific abstracts from diverse journals. |
|
|
* **Output:** A probability distribution over the 8 classes: Physics, Chemistry, Medicine, Computer Science, Biology, Geoscience, Materials Science, and Engineering. |
|
|
* **Training Dataset:** **ScientificArticleAbstract_Classification**, providing abstracts linked to their high-level research disciplines. |
|
|
|
|
|
## ๐ฏ Intended Use |
|
|
|
|
|
The model offers utility in several scientific and information retrieval contexts: |
|
|
|
|
|
1. **Automated Library and Repository Indexing:** Rapidly and accurately tagging new publications with their correct discipline. |
|
|
2. **Literature Review Automation:** Filtering large databases of articles to focus on specific fields. |
|
|
3. **Grant Proposal Routing:** Assisting research institutions in routing incoming proposals to the appropriate review panel or expert based on the summary. |
|
|
4. **Trend Analysis:** Tracking the volume and convergence of research across different fields. |
|
|
|
|
|
## โ ๏ธ Limitations |
|
|
|
|
|
1. **Interdisciplinary Papers:** The model performs single-label classification. It may struggle with highly interdisciplinary abstracts that bridge two or more distinct fields (e.g., computational chemistry or bio-engineering). |
|
|
2. **Vocabulary Drift:** Scientific terminology evolves quickly. New sub-disciplines or extremely novel concepts may not be classified correctly until the model is retrained. |
|
|
3. **Class Imbalance:** If the underlying distribution of the eight fields in the real world shifts significantly from the training set, performance may vary. |
|
|
|
|
|
### MODEL 3: **EcommerceAspectSentiment_BART** |
|
|
|
|
|
This model is a BART-large sequence-to-sequence model fine-tuned for abstractive multi-aspect sentiment summarization based on Dataset 3 (EcommerceCustomerReview\_MultiAspectRating). |
|
|
|
|
|
#### config.json |
|
|
|
|
|
```json |
|
|
{ |
|
|
"_name_or_path": "facebook/bart-large", |
|
|
"architectures": [ |
|
|
"BartForConditionalGeneration" |
|
|
], |
|
|
"model_type": "bart", |
|
|
"vocab_size": 50265, |
|
|
"d_model": 1024, |
|
|
"encoder_layers": 12, |
|
|
"decoder_layers": 12, |
|
|
"encoder_attention_heads": 16, |
|
|
"decoder_attention_heads": 16, |
|
|
"encoder_ffn_dim": 4096, |
|
|
"decoder_ffn_dim": 4096, |
|
|
"dropout": 0.1, |
|
|
"activation_function": "gelu", |
|
|
"init_std": 0.02, |
|
|
"num_labels": 3, |
|
|
"max_position_embeddings": 1024, |
|
|
"eos_token_id": 2, |
|
|
"bos_token_id": 0, |
|
|
"pad_token_id": 1, |
|
|
"is_encoder_decoder": true, |
|
|
"scale_embedding": false, |
|
|
"forced_eos_token_id": 2, |
|
|
"transformers_version": "4.35.2" |
|
|
} |