Update README.md

99f0016 verified 2 months ago

5.44 kB

	---
	language:
	- en
	license: mit
	tags:
	- summarization
	- dialogue-summarization
	- bart
	- lora
	- merged
	- finetuned
	- seq2seq
	- transformers
	- highlightsum
	pipeline_tag: summarization
	library_name: transformers
	metrics:
	- rouge1
	- rouge2
	- rougeL
	- bertscore
	- bleu
	model_name: bart-highlightsum-merged
	base_model: facebook/bart-large-cnn
	new_version: "1.0.0"
	datasets:
	- knkarthick/highlightsum
	---


	# HuggingFace Model Card — BART-HighlightSum (Merged Model)

	## BART-HighlightSum (Merged Model)
	Fine-tuned BART-Large on the HighlightSum dialogue summarization dataset (Merged LoRA → Full Model)


	Model type: Seq2Seq Summarization
	Base model: facebook/bart-large-cnn
	Dataset: HighlightSum (dialogue summarization)
	Finetuning method: LoRA → merged into full FP16 BART

	## Model Summary

	This model is a merged BART-Large fine-tuned on 2,000 training + 200 validation samples from the HighlightSum dataset.
	It produces concise, accurate summaries of multi-turn dialogues.

	✔ LoRA fine-tuning
	✔ LoRA weights merged into base BART
	✔ No PEFT required for inference
	✔ Lightweight, fast, and deployment-ready

	This version is recommended for production, as it scores highest among all variants (Baseline, LoRA, Merged).

	## Performance/Evaluation Results
	Evaluation on HighlightSum (Validation 200 samples)
	The following results were obtained using 200 validation samples from the HighlightSum dataset.

	### Merged Model Performance

	\| Metric \| Score \|
	\|--------\|--------\|
	\| ROUGE-1 \| 0.383 \|
	\| ROUGE-2 \| 0.179 \|
	\| ROUGE-L \| 0.301 \|
	\| BERTScore (F1) \| 0.335 \|
	\| BLEU \| 0.0014 \|

	### Comparison with Baseline and LoRA Models

	\| Metric \| Baseline BART \| LoRA Model \| Merged Model \|
	\|--------\|---------------\|------------\|--------------\|
	\| ROUGE-1 \| 0.275 \| 0.337 \| 0.383 \|
	\| ROUGE-2 \| 0.090 \| 0.152 \| 0.179 \|
	\| ROUGE-L \| 0.204 \| 0.252 \| 0.301 \|
	\| BERTScore (F1) \| 0.163 \| 0.298 \| 0.335 \|
	\| BLEU \| 0.0052 \| 0.0111 \| 0.0014 \|

	### Conclusion

	The merged model performs best, achieving the highest ROUGE-1, ROUGE-2, ROUGE-L and BERTScore among all variants.
	It is therefore the recommended model for deployment, inference, and user-facing applications.

	## 🧪 Example Input / Output

	Using Example #1 from the HighlightSum dataset:

	### Dialogue
	```
	A: What are you getting him?
	B: Something cool.
	A: What about a Lego?
	B: He is too old for that now.
	A: What about a book?
	B: He hates reading.
	A: Then I give up. I have no idea what to get him.
	```

	### Human Gold Summary
	They discuss gift ideas for someone's son.

	### Merged Model Summary
	They talk about what to get a boy as a gift but can't decide.

	→ The model captures the intent, context, and key meaning with improved fluency and coherence.

	## Intended Use

	### Suitable for
	- Dialogue summarization
	- Customer service chat compression
	- Meeting note extraction
	- Educational tools

	### Not suitable for
	- Factual QA
	- Domain-specific technical summaries without fine-tuning
	- Safety-critical use

	## How to Use

	### Python Inference
	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	import torch

	model_name = "dlaima/bart-highlightsum-merged"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

	text = """A: Hi Tom, are you busy tomorrow afternoon?
	B: I think I am. Why?
	A: I want to go to the animal shelter.
	B: For what?
	A: I'm getting a puppy for my son."""

	inputs = tokenizer(text, return_tensors="pt", truncation=True)
	summary = model.generate(**inputs, max_new_tokens=192)
	print(tokenizer.decode(summary[0], skip_special_tokens=True))
	```

	## Training Details

	- Method: LoRA (rank 8)
	- Model: BART-Large
	- Batch size: 8 (micro-batch 4 × grad-accumulation 2)
	- Epochs: ~2.4 (capped by 2000 examples)
	- Max input length: 768 tokens
	- Max summary length: 192 tokens
	- Precision: FP16
	- Optimizer: AdamW
	- Learning rate: 3e-4
	- Hardware: NVIDIA T4

	## 📚 Dataset: HighlightSum

	A dataset of dialogue → summary pairs from multiple conversational sources.

	- Multi-turn dialogues
	- Short, medium, or long
	- Realistic conversational structure
	- Human-written summaries

	### Subset used here:
	- 2,000 samples for training
	- 200 samples for validation

	## Files Included in This Repo

	\| File \| Description \|
	\|------\|-------------\|
	\| pytorch_model.bin \| Final merged FP16 BART model \|
	\| config.json \| Standard HuggingFace config \|
	\| generation_config.json \| Beam search config \|
	\| tokenizer.json / tokenizer.model \| Tokenizer files \|
	\| README.md \| This model card \|

	## Limitations & Recommendations

	### Limitations
	- May shorten overly long dialogues excessively
	- Not designed for domain-specific jargon
	- Occasionally omits rare names or details
	- Not a factual QA model
	- Can hallucinate minor details in complex dialogues

	### Recommendations
	- Use merged model for production
	- Apply additional fine-tuning for domain-specific tasks
	- For 100% reproducibility, fix random seeds and HF transformers version
	- Consider quantization (INT8 or GGUF) for mobile deployment

	## Maintenance

	This model will be updated as:
	- Additional training data becomes available
	- Larger LoRA variants are tested
	- Better merging & evaluation pipelines are developed

	## Contact

	For questions, improvements, or collaboration, feel free to reach out via GitHub or HuggingFace (@dlaima).