Bhavyagowni
/

t5-scibert-highlights

text2text-generation

text-generation-inference

Model card Files Files and versions

t5-scibert-highlights / README.md

Bhavyagowni's picture

Update README.md

290f302 verified 6 months ago

|

history blame contribute delete

3.58 kB

	---
	library_name: transformers
	language:
	- en
	metrics:
	- rouge
	- meteor
	base_model:
	- google-t5/t5-base
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	- Developed by: Gowni Bhavishya,Dr.Shib Shankar Sahu
	- Model type: T5 (Text-To-Text Transfer Transformer) fine-tuned for scientific summarization, with SciBERT-based abstract representations.
	- Language(s) (NLP): English (Scientific domain)
	- Finetuned from model [optional]: t5-base

	## Uses

	Researchers in biomedical and scientific fields
	Academic publishers and editors
	Developers building scientific summarization tools
	NLP practitioners working on domain-specific summarization
	### Direct Use

	Generate highlights or concise summaries of scientific abstracts (especially biomedical, life sciences, or clinical research)



	### Out-of-Scope Use

	1. Not suitable for general news summarization, social media content, or informal language.
	2. Should not be used for critical medical decision-making or clinical diagnostics.
	3. Not designed for creative writing, dialogue generation, or question answering.
	4. Avoid using this model for non-English abstracts or multilingual input—it was trained on English biomedical text only.

	## Bias, Risks, and Limitations

	While BART performs well on biomedical abstracts, it inherits limitations from both:
	1. Pretrained BART model biases (from general corpora like Wikipedia and Books)
	2. Training dataset distribution biases (e.g., if your abstracts are from PubMed or a niche field) Known Limitations:
	3. May generate generic summaries if abstracts are vague or long.
	4. Struggles with mathematical, chemical, or symbolic notation.
	5. Output may appear plausible but factually incorrect.
	6. Does not provide citations or references for claims.

	### Recommendations
	1. Always validate generated summaries against the full abstract or ground truth highlights.
	2. Preferably use in human-in-the-loop systems where an expert reviews the output.
	3. Fine-tune further or filter input for domain-specific tasks (e.g., cardiology vs oncology). Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.



	## Training Details

	### Training Data
	1.Fine-tuned on a dataset of scientific abstracts and their corresponding highlights.
	The training dataset was split into train (10k), validation (2k), and test (1.8k) sets. Input: Abstract column Target: Highlights column (only in train/val)



	#### Training Hyperparameters
	Model architecture: facebook/bart-large
	Batch size: 4 (per device)
	Epochs: 5
	Learning rate: 2e-5


	## Evaluation
	Rouge1,Rouge2,RougeL,Meteor.
	### Testing Data, Factors & Metrics

	#### Testing Data

	The test set consists of 1,840 scientific abstracts without ground-truth highlights.



	#### Metrics

	ROUGE-1: Measures unigram overlap (precision & recall)
	ROUGE-2: Measures bigram overlap
	ROUGE-L: Measures longest common subsequence
	METEOR: Incorporates synonymy, stemming, and word order


	### Results


	#### Summary


	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]

	## More Information [optional]

	SVNIT CSE

	## Model Card Authors [optional]
	Gowni Bhavishya,Dr.Shib Sankar Sahu

	## Model Card Contact

	[More Information Needed]