Bhavyagowni's picture
Update README.md
290f302 verified
---
library_name: transformers
language:
- en
metrics:
- rouge
- meteor
base_model:
- google-t5/t5-base
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** Gowni Bhavishya,Dr.Shib Shankar Sahu
- **Model type:** T5 (Text-To-Text Transfer Transformer) fine-tuned for scientific summarization, with SciBERT-based abstract representations.
- **Language(s) (NLP):** English (Scientific domain)
- **Finetuned from model [optional]:** t5-base
## Uses
Researchers in biomedical and scientific fields
Academic publishers and editors
Developers building scientific summarization tools
NLP practitioners working on domain-specific summarization
### Direct Use
Generate highlights or concise summaries of scientific abstracts (especially biomedical, life sciences, or clinical research)
### Out-of-Scope Use
1. Not suitable for general news summarization, social media content, or informal language.
2. Should not be used for critical medical decision-making or clinical diagnostics.
3. Not designed for creative writing, dialogue generation, or question answering.
4. Avoid using this model for non-English abstracts or multilingual input—it was trained on English biomedical text only.
## Bias, Risks, and Limitations
While BART performs well on biomedical abstracts, it inherits limitations from both:
1. Pretrained BART model biases (from general corpora like Wikipedia and Books)
2. Training dataset distribution biases (e.g., if your abstracts are from PubMed or a niche field) Known Limitations:
3. May generate generic summaries if abstracts are vague or long.
4. Struggles with mathematical, chemical, or symbolic notation.
5. Output may appear plausible but factually incorrect.
6. Does not provide citations or references for claims.
### Recommendations
1. Always validate generated summaries against the full abstract or ground truth highlights.
2. Preferably use in human-in-the-loop systems where an expert reviews the output.
3. Fine-tune further or filter input for domain-specific tasks (e.g., cardiology vs oncology). Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## Training Details
### Training Data
1.Fine-tuned on a dataset of scientific abstracts and their corresponding highlights.
The training dataset was split into train (10k), validation (2k), and test (1.8k) sets. Input: Abstract column Target: Highlights column (only in train/val)
#### Training Hyperparameters
Model architecture: facebook/bart-large
Batch size: 4 (per device)
Epochs: 5
Learning rate: 2e-5
## Evaluation
Rouge1,Rouge2,RougeL,Meteor.
### Testing Data, Factors & Metrics
#### Testing Data
The test set consists of 1,840 scientific abstracts without ground-truth highlights.
#### Metrics
ROUGE-1: Measures unigram overlap (precision & recall)
ROUGE-2: Measures bigram overlap
ROUGE-L: Measures longest common subsequence
METEOR: Incorporates synonymy, stemming, and word order
### Results
#### Summary
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## More Information [optional]
SVNIT CSE
## Model Card Authors [optional]
Gowni Bhavishya,Dr.Shib Sankar Sahu
## Model Card Contact
[More Information Needed]