marvingoenner
/

470finalprojectmodel

conversational-ai

text2text-generation

Model card Files Files and versions

470finalprojectmodel / README.md

marvingoenner's picture

Update README.md

914c59b verified 14 days ago

|

history blame contribute delete

2.48 kB

	---
	language: en
	license: apache-2.0
	tags:
	- summarization
	- conversational-ai
	- text2text-generation
	- t5
	datasets:
	- knkarthick/samsum
	metrics:
	- rouge
	base_model:
	- google-t5/t5-small
	---

	# 470 Final Project Model -> Summary Model

	## Model Overview
	This repository contains a fine-tuned T5-small model for abstractive conversational text summarization.
	Given a multi-speaker dialogue, the model generates a concise natural-language summary that captures the main points of the conversation.

	- Base model: google-t5/t5-small
	- Task: Abstractive text summarization
	- Model type: Encoder–decoder transformer (T5)

	---

	## Dataset
	The model was fine-tuned on the SAMSum dataset, which consists of chat-style conversations paired with human-written summaries.

	- Dataset name: knkarthick/samsum
	- Fields:
	- `dialogue`: conversation text (input)
	- `summary`: reference summary (target)
	- Splits: train / validation / test

	---

	## Training Details
	- Epochs: 3
	- Learning rate: 3e-4
	- Batch size: 8
	- Max input length: 512 tokens
	- Max target length: 128 tokens
	- Training framework: Hugging Face Transformers (`Seq2SeqTrainer`)
	- Hardware: GPU (Google Colab)

	---

	## Evaluation
	The model was evaluated on the test split of the SAMSum dataset using ROUGE metrics.

	- ROUGE-1: 0.4538
	- ROUGE-2: 0.2123
	- ROUGE-L: 0.3762

	(Replace the values with the scores obtained in the notebook.)

	---

	## Intended Uses
	This model can be used for:
	- Summarizing chat conversations or dialogues
	- Demonstrations of abstractive summarization
	- Educational purposes in NLP and machine learning

	---

	## Limitations
	- The model may omit important details in long or complex conversations.
	- Generated summaries may occasionally be imprecise or incomplete.
	- The model is trained on informal dialogue and may not generalize well to other domains.

	---

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	repo_id = "marvingoenner/470finalprojectmodel"

	tokenizer = AutoTokenizer.from_pretrained(repo_id)
	model = AutoModelForSeq2SeqLM.from_pretrained(repo_id)

	dialogue = "Amanda: I baked cookies. Jerry: Sounds great! Amanda: I will bring some tomorrow."
	inputs = tokenizer("summarize: " + dialogue, return_tensors="pt", truncation=True)
	output_ids = model.generate(**inputs, max_new_tokens=64)

	print(tokenizer.decode(output_ids[0], skip_special_tokens=True))