| # π§ Text Summarization for Product Descriptions | |
| A **T5-small-based** abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation. | |
| --- | |
| ## β¨ Model Highlights | |
| - π Based on [`t5-small`](https://huggingface.co/t5-small) | |
| - π§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries | |
| - β‘ Supports **abstractive summarization** of English product texts | |
| - π§ Built using **Hugging Face Transformers** and **PyTorch** | |
| --- | |
| ## π§ Intended Uses | |
| - β Auto-generating product summaries for catalogs or online listings | |
| - β Shortening verbose product descriptions for UI-friendly displays | |
| - β Content creation support for e-commerce and marketing | |
| --- | |
| ## π« Limitations | |
| - β English-only (not trained for multilingual input) | |
| - π§ Cannot fact-check or verify real-world product details | |
| - π§ͺ Trained on synthetic data β real-world generalization may be limited | |
| - β οΈ May generate generic or repetitive summaries for complex inputs | |
| --- | |
| ## ποΈββοΈ Training Details | |
| | Attribute | Value | | |
| |-------------------|-----------------------------------------------| | |
| | Base Model | `t5-small` | | |
| | Dataset | Custom synthetic CSV of product summaries | | |
| | Input Field | `product_description` | | |
| | Target Field | `summary` | | |
| | Max Token Length | 512 input / 64 summary | | |
| | Epochs | 3 | | |
| | Batch Size | 4 | | |
| | Optimizer | AdamW | | |
| | Loss Function | CrossEntropyLoss (via `Trainer`) | | |
| | Framework | PyTorch + Transformers | | |
| | Hardware | CUDA-enabled GPU | | |
| --- | |
| ## π Evaluation Metrics | |
| | Metric | Score (Synthetic Eval) | | |
| |-----------|------------------------| | |
| | ROUGE-1 | 24.49 | | |
| | ROUGE-2 | 22.10 | | |
| | ROUGE-L | 24.47 | | |
| | ROUGE-lsum| 24.46 | | |
| --- | |
| ## π Usage | |
| ```python | |
| from transformers import T5Tokenizer, T5ForConditionalGeneration | |
| import torch | |
| model_name = "your-username/Text-Summarization-for-Product-Descriptions" | |
| tokenizer = T5Tokenizer.from_pretrained(model_name) | |
| model = T5ForConditionalGeneration.from_pretrained(model_name) | |
| model.eval() | |
| def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64): | |
| model.eval() | |
| device = next(model.parameters()).device # get device (cpu or cuda) | |
| input_text = "summarize: " + text.strip() | |
| inputs = tokenizer( | |
| input_text, | |
| return_tensors="pt", | |
| truncation=True, | |
| padding="max_length", | |
| max_length=max_input_length | |
| ).to(device) # move inputs to device | |
| with torch.no_grad(): | |
| summary_ids = model.generate( | |
| input_ids=inputs["input_ids"], | |
| attention_mask=inputs["attention_mask"], | |
| max_length=max_output_length, | |
| num_beams=4, | |
| early_stopping=True | |
| ) | |
| summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) | |
| return summary | |
| # Example | |
| text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base." | |
| print("Summary:", summarize(text)) | |
| ``` | |
| ## π Repository Structure | |
| ``` | |
| . | |
| βββ model/ # Fine-tuned model files (pytorch_model.bin, config.json) | |
| βββ tokenizer/ # Tokenizer config and vocab | |
| βββ training_script.py # Training code | |
| βββ product_descriptions.csv # Source dataset | |
| βββ utils.py # Preprocessing & summarization utilities | |
| βββ README.md # Model card | |
| ``` | |
| ## π€ Contributing | |
| Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates. |