eungyu kim

feat: Add T5 Text Summarizer code and README documentation.

bfde142 11 months ago

991 Bytes

	# T5 Text Summarizer

	This repository contains a simple text summarization script using a pre-trained T5 model from the Hugging Face Transformers library. The script demonstrates how to use prompt-based summarization to generate a concise summary of an input text.

	## Overview

	The main script (`model.py`) defines a function `summarize_text` that:
	- Loads the T5 tokenizer and T5 model.
	- Adds a summarization prompt (`"summarize: "`) to the input text.
	- Tokenizes the input text and truncates it to a maximum length.
	- Generates a summary using beam search.
	- Decodes the generated token sequence back into human-readable text while skipping special tokens.

	## Code Explanation

	### Tokenization and Decoding

	- Tokenization:
	The input text is first prefixed with the summarization prompt and then tokenized using:
	```python
	input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)

	---
	license: apache-2.0
	---