File size: 3,759 Bytes
bf1cc01
 
342a838
 
 
 
 
 
 
bf1cc01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
342a838
 
 
 
bf1cc01
 
342a838
bf1cc01
 
 
 
342a838
 
 
 
bf1cc01
 
342a838
bf1cc01
 
 
 
 
 
342a838
 
 
 
bf1cc01
 
 
342a838
 
 
 
 
 
 
 
bf1cc01
 
 
342a838
 
 
bf1cc01
 
 
 
 
 
 
342a838
 
 
 
bf1cc01
 
 
342a838
 
 
 
bf1cc01
 
 
342a838
bf1cc01
 
 
 
 
342a838
bf1cc01
 
 
342a838
 
 
 
bf1cc01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
342a838
bf1cc01
 
 
adce9e7
bf1cc01
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
library_name: transformers
language:
- en
metrics:
- rouge
- meteor
base_model:
- facebook/bart-large
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Gowni Bhavishya,Dr.Shib Shankar sahu
- **Model type:** Sequence-to-Sequence model (BART) fine-tuned for scientific highlight generation
- **Language(s) (NLP):** English
- **Finetuned from model [optional]:** facebook/bart-large.

- **Repository:** [More Information Needed]


## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
1. Researchers in biomedical and scientific fields
2. Academic publishers and editors
3. Developers building scientific summarization tools
4. NLP practitioners working on domain-specific summarization

### Direct Use
Generate highlights or concise summaries of scientific abstracts (especially biomedical, life sciences, or clinical research)

[More Information Needed]


### Out-of-Scope Use

1. Not suitable for general news summarization, social media content, or informal language.
2. Should not be used for critical medical decision-making or clinical diagnostics.
3. Not designed for creative writing, dialogue generation, or question answering.
4. Avoid using this model for non-English abstracts or multilingual input—it was trained on English biomedical text only.


## Bias, Risks, and Limitations
While BART performs well on biomedical abstracts, it inherits limitations from both:
1. Pretrained BART model biases (from general corpora like Wikipedia and Books)
2. Training dataset distribution biases (e.g., if your abstracts are from PubMed or a niche field)
Known Limitations:
1. May generate generic summaries if abstracts are vague or long.
2. Struggles with mathematical, chemical, or symbolic notation.
3. Output may appear plausible but factually incorrect.
4. Does not provide citations or references for claims.

### Recommendations

1. Always validate generated summaries against the full abstract or ground truth highlights.
2. Preferably use in human-in-the-loop systems where an expert reviews the output.
3. Fine-tune further or filter input for domain-specific tasks (e.g., cardiology vs oncology).
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


## Training Details

### Training Data

1.Fine-tuned on a dataset of scientific abstracts and their corresponding highlights.
The training dataset was split into train (10k), validation (2k), and test (1.8k) sets.
Input: Abstract column
Target: Highlights column (only in train/val)

#### Training Hyperparameters

Model architecture: facebook/bart-large
Batch size: 4 (per device)
Epochs: 5
Learning rate: 2e-5


## Evaluation
Rouge1,Rouge2,RougeL,Meteor.

### Testing Data, Factors & Metrics

#### Testing Data

The test set consists of 1,840 scientific abstracts without ground-truth highlights.


#### Metrics
ROUGE-1: Measures unigram overlap (precision & recall)
ROUGE-2: Measures bigram overlap
ROUGE-L: Measures longest common subsequence
METEOR: Incorporates synonymy, stemming, and word order

### Results

#### Summary






**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]


## More Information [optional]

SVNIT CSE 

## Model Card Authors [optional]

Gowni Bhavishya,Dr.Shib Sankar Sahu

## Model Card Contact

[More Information Needed]