Tasfiya025
/

ScientificTextClassification_ResearchField

+---
+tags:
+- text-classification
+- roberta
+- scientific-abstracts
+- multi-class
+- research-field-classification
+datasets:
+- ScientificArticleAbstract_Classification
+license: apache-2.0
+model-index:
+- name: ScientificTextClassification_ResearchField
+  results:
+  - task:
+      name: Text Classification
+      type: text-classification
+    metrics:
+    - type: accuracy
+      value: 0.941
+      name: Accuracy (Top-1)
+    - type: macro_f1
+      value: 0.935
+      name: Macro F1 Score
+---
+# ScientificTextClassification_ResearchField
+## 📚 Overview
+This is a **RoBERTa-base** model fine-tuned for the complex task of multi-class classification of scientific article abstracts. The model predicts the **primary research field** (e.g., Physics, Biology, Computer Science) based solely on the abstract text, serving as a powerful tool for automated journal indexing and literature review organization.
+## 🧠 Model Architecture
+The choice of RoBERTa ensures enhanced robustness and better handling of long-range dependencies common in technical and scientific prose.
+* **Base Model:** `roberta-base` (an optimized BERT approach without the next-sentence prediction objective).
+* **Classification Head:** Outputs 8 distinct categories (`num_labels: 8`).
+* **Input Data:** Detailed scientific abstracts from diverse journals.
+* **Output:** A probability distribution over the 8 classes: Physics, Chemistry, Medicine, Computer Science, Biology, Geoscience, Materials Science, and Engineering.
+* **Training Dataset:** **ScientificArticleAbstract_Classification**, providing abstracts linked to their high-level research disciplines.
+## 🎯 Intended Use
+The model offers utility in several scientific and information retrieval contexts:
+1.  **Automated Library and Repository Indexing:** Rapidly and accurately tagging new publications with their correct discipline.
+2.  **Literature Review Automation:** Filtering large databases of articles to focus on specific fields.
+3.  **Grant Proposal Routing:** Assisting research institutions in routing incoming proposals to the appropriate review panel or expert based on the summary.
+4.  **Trend Analysis:** Tracking the volume and convergence of research across different fields.
+## ⚠️ Limitations
+1.  **Interdisciplinary Papers:** The model performs single-label classification. It may struggle with highly interdisciplinary abstracts that bridge two or more distinct fields (e.g., computational chemistry or bio-engineering).
+2.  **Vocabulary Drift:** Scientific terminology evolves quickly. New sub-disciplines or extremely novel concepts may not be classified correctly until the model is retrained.
+3.  **Class Imbalance:** If the underlying distribution of the eight fields in the real world shifts significantly from the training set, performance may vary.
+### MODEL 3: **EcommerceAspectSentiment_BART**
+This model is a BART-large sequence-to-sequence model fine-tuned for abstractive multi-aspect sentiment summarization based on Dataset 3 (EcommerceCustomerReview\_MultiAspectRating).
+#### config.json
+```json
+{
+  "_name_or_path": "facebook/bart-large",
+  "architectures": [
+    "BartForConditionalGeneration"
+  ],
+  "model_type": "bart",
+  "vocab_size": 50265,
+  "d_model": 1024,
+  "encoder_layers": 12,
+  "decoder_layers": 12,
+  "encoder_attention_heads": 16,
+  "decoder_attention_heads": 16,
+  "encoder_ffn_dim": 4096,
+  "decoder_ffn_dim": 4096,
+  "dropout": 0.1,
+  "activation_function": "gelu",
+  "init_std": 0.02,
+  "num_labels": 3,
+  "max_position_embeddings": 1024,
+  "eos_token_id": 2,
+  "bos_token_id": 0,
+  "pad_token_id": 1,
+  "is_encoder_decoder": true,
+  "scale_embedding": false,
+  "forced_eos_token_id": 2,
+  "transformers_version": "4.35.2"
+}