Tasfiya025
/

Multi-Label-Scientific-Abstract-Classifier

+---
+tags:
+- bert
+- scientific-abstract
+- multi-label-classification
+- natural-language-processing
+datasets:
+- custom-scientific-abstracts
+license: apache-2.0
+---
+# SciAbstract-MultiLabel-BERT-Base
+## 📝 Overview
+**SciAbstract-MultiLabel-BERT-Base** is a specialized multi-label text classification model fine-tuned on scientific paper abstracts. It simultaneously classifies an abstract into its **Primary Topic** and determines the underlying **Sentiment/Impact** of the research findings (e.g., highly positive breakthrough, negative result/concern).
+The model is based on the robust `bert-base-uncased` architecture and is ideal for automating the categorization and high-level assessment of large volumes of academic literature.
+## 🧠 Model Architecture
+The model uses the `BertForSequenceClassification` head, configured for a multi-label setup.
+* **Base Model:** `bert-base-uncased`
+* **Input:** Scientific abstract text.
+* **Output:** A 17-dimensional vector of logits, where each dimension corresponds to one of the 17 potential labels (12 Topics + 5 Sentiments). The model uses a sigmoid activation function for the final layer to handle the multi-label nature, allowing it to predict multiple positive labels (e.g., one Topic and one Sentiment) for a single input.
+* **Loss Function:** Binary Cross-Entropy with Logits (BCEWithLogitsLoss).
+* **Labels:**
+    * **Topics:** Materials Science, Neuroscience, Computer Science, Ecology, Astrophysics, Medicine, etc.
+    * **Sentiments:** Highly Positive, Positive, Moderately Negative, Negative, Highly Negative.
+## 🚀 Intended Use
+* **Automated Document Triage:** Rapidly categorize new research papers for subject-matter experts.
+* **Literature Review:** Filter and prioritize papers based on topic and the detected impact (sentiment) of the findings.
+* **Trend Analysis:** Track the volume of positive vs. negative research outcomes within specific scientific fields over time.
+## ⚠️ Limitations
+* **Multi-Label Complexity:** The model may struggle with abstracts that span highly ambiguous or highly interdisciplinary topics not well-represented in the training data.
+* **Sentiment Scope:** The sentiment classification is specifically tailored to the tone of scientific findings (e.g., success of an experiment, critical failure, potential concern) and may not generalize well to general public sentiment.
+* **Maximum Length:** Input text is truncated to 512 tokens (the BERT standard). Extremely long abstracts may lose critical information.
+## 💻 Example Code
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+model_name = "Your-HF-Username/SciAbstract-MultiLabel-BERT-Base"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Sample abstract
+abstract = "Development of a quantum entanglement system achieving coherence for over 10 seconds at room temperature, a significant breakthrough for quantum computing."
+# Tokenize input
+inputs = tokenizer(abstract, return_tensors="pt", truncation=True, padding=True)
+# Make prediction
+with torch.no_grad():
+    logits = model(**inputs).logits
+# Apply sigmoid to get probabilities for each label
+probabilities = torch.sigmoid(logits).squeeze()
+# Get the label IDs and names
+id2label = model.config.id2label
+predicted_labels = [id2label[i] for i, prob in enumerate(probabilities) if prob > 0.5] # Threshold at 0.5
+print(f"Abstract: {abstract}")
+print("-" * 30)
+print(f"Predicted Labels: {predicted_labels}")
+# Expected Output Example: ['Topic: Physics', 'Sentiment: Highly Positive']