eungyu kim
feat: Add T5 Text Summarizer code and README documentation.
bfde142

T5 Text Summarizer

This repository contains a simple text summarization script using a pre-trained T5 model from the Hugging Face Transformers library. The script demonstrates how to use prompt-based summarization to generate a concise summary of an input text.

Overview

The main script (model.py) defines a function summarize_text that:

  • Loads the T5 tokenizer and T5 model.
  • Adds a summarization prompt ("summarize: ") to the input text.
  • Tokenizes the input text and truncates it to a maximum length.
  • Generates a summary using beam search.
  • Decodes the generated token sequence back into human-readable text while skipping special tokens.

Code Explanation

Tokenization and Decoding

  • Tokenization:
    The input text is first prefixed with the summarization prompt and then tokenized using:
    input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)
    

license: apache-2.0