Tasfiya025 commited on
Commit
7932624
·
verified ·
1 Parent(s): 3142176

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - bert
4
+ - scientific-abstract
5
+ - multi-label-classification
6
+ - natural-language-processing
7
+ datasets:
8
+ - custom-scientific-abstracts
9
+ license: apache-2.0
10
+ ---
11
+
12
+ # SciAbstract-MultiLabel-BERT-Base
13
+
14
+ ## 📝 Overview
15
+
16
+ **SciAbstract-MultiLabel-BERT-Base** is a specialized multi-label text classification model fine-tuned on scientific paper abstracts. It simultaneously classifies an abstract into its **Primary Topic** and determines the underlying **Sentiment/Impact** of the research findings (e.g., highly positive breakthrough, negative result/concern).
17
+
18
+ The model is based on the robust `bert-base-uncased` architecture and is ideal for automating the categorization and high-level assessment of large volumes of academic literature.
19
+
20
+ ## 🧠 Model Architecture
21
+
22
+ The model uses the `BertForSequenceClassification` head, configured for a multi-label setup.
23
+
24
+ * **Base Model:** `bert-base-uncased`
25
+ * **Input:** Scientific abstract text.
26
+ * **Output:** A 17-dimensional vector of logits, where each dimension corresponds to one of the 17 potential labels (12 Topics + 5 Sentiments). The model uses a sigmoid activation function for the final layer to handle the multi-label nature, allowing it to predict multiple positive labels (e.g., one Topic and one Sentiment) for a single input.
27
+ * **Loss Function:** Binary Cross-Entropy with Logits (BCEWithLogitsLoss).
28
+ * **Labels:**
29
+ * **Topics:** Materials Science, Neuroscience, Computer Science, Ecology, Astrophysics, Medicine, etc.
30
+ * **Sentiments:** Highly Positive, Positive, Moderately Negative, Negative, Highly Negative.
31
+
32
+ ## 🚀 Intended Use
33
+
34
+ * **Automated Document Triage:** Rapidly categorize new research papers for subject-matter experts.
35
+ * **Literature Review:** Filter and prioritize papers based on topic and the detected impact (sentiment) of the findings.
36
+ * **Trend Analysis:** Track the volume of positive vs. negative research outcomes within specific scientific fields over time.
37
+
38
+ ## ⚠️ Limitations
39
+
40
+ * **Multi-Label Complexity:** The model may struggle with abstracts that span highly ambiguous or highly interdisciplinary topics not well-represented in the training data.
41
+ * **Sentiment Scope:** The sentiment classification is specifically tailored to the tone of scientific findings (e.g., success of an experiment, critical failure, potential concern) and may not generalize well to general public sentiment.
42
+ * **Maximum Length:** Input text is truncated to 512 tokens (the BERT standard). Extremely long abstracts may lose critical information.
43
+
44
+ ## 💻 Example Code
45
+
46
+ ```python
47
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
48
+ import torch
49
+
50
+ # Load model and tokenizer
51
+ model_name = "Your-HF-Username/SciAbstract-MultiLabel-BERT-Base"
52
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
53
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
54
+
55
+ # Sample abstract
56
+ abstract = "Development of a quantum entanglement system achieving coherence for over 10 seconds at room temperature, a significant breakthrough for quantum computing."
57
+
58
+ # Tokenize input
59
+ inputs = tokenizer(abstract, return_tensors="pt", truncation=True, padding=True)
60
+
61
+ # Make prediction
62
+ with torch.no_grad():
63
+ logits = model(**inputs).logits
64
+
65
+ # Apply sigmoid to get probabilities for each label
66
+ probabilities = torch.sigmoid(logits).squeeze()
67
+
68
+ # Get the label IDs and names
69
+ id2label = model.config.id2label
70
+ predicted_labels = [id2label[i] for i, prob in enumerate(probabilities) if prob > 0.5] # Threshold at 0.5
71
+
72
+ print(f"Abstract: {abstract}")
73
+ print("-" * 30)
74
+ print(f"Predicted Labels: {predicted_labels}")
75
+ # Expected Output Example: ['Topic: Physics', 'Sentiment: Highly Positive']