AventIQ-AI
/

Text-Summarization-for-Product-Descriptions

Model card Files Files and versions

Text-Summarization-for-Product-Descriptions / README.md

vishal1364's picture

Create README.md

1381a59 verified 9 months ago

|

history blame contribute delete

4.27 kB

	# 🧠 Text Summarization for Product Descriptions

	A T5-small-based abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.

	---

	## ✨ Model Highlights

	- 📌 Based on [`t5-small`](https://huggingface.co/t5-small)
	- 🧪 Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
	- ⚡ Supports abstractive summarization of English product texts
	- 🧠 Built using Hugging Face Transformers and PyTorch

	---

	## 🧠 Intended Uses

	- ✅ Auto-generating product summaries for catalogs or online listings
	- ✅ Shortening verbose product descriptions for UI-friendly displays
	- ✅ Content creation support for e-commerce and marketing

	---

	## 🚫 Limitations

	- ❌ English-only (not trained for multilingual input)
	- 🧠 Cannot fact-check or verify real-world product details
	- 🧪 Trained on synthetic data — real-world generalization may be limited
	- ⚠️ May generate generic or repetitive summaries for complex inputs

	---

	## 🏋️‍♂️ Training Details

	\| Attribute \| Value \|
	\|-------------------\|-----------------------------------------------\|
	\| Base Model \| `t5-small` \|
	\| Dataset \| Custom synthetic CSV of product summaries \|
	\| Input Field \| `product_description` \|
	\| Target Field \| `summary` \|
	\| Max Token Length \| 512 input / 64 summary \|
	\| Epochs \| 3 \|
	\| Batch Size \| 4 \|
	\| Optimizer \| AdamW \|
	\| Loss Function \| CrossEntropyLoss (via `Trainer`) \|
	\| Framework \| PyTorch + Transformers \|
	\| Hardware \| CUDA-enabled GPU \|

	---

	## 📊 Evaluation Metrics

	\| Metric \| Score (Synthetic Eval) \|
	\|-----------\|------------------------\|
	\| ROUGE-1 \| 24.49 \|
	\| ROUGE-2 \| 22.10 \|
	\| ROUGE-L \| 24.47 \|
	\| ROUGE-lsum\| 24.46 \|

	---

	## 🚀 Usage

	```python
	from transformers import T5Tokenizer, T5ForConditionalGeneration
	import torch

	model_name = "your-username/Text-Summarization-for-Product-Descriptions"
	tokenizer = T5Tokenizer.from_pretrained(model_name)
	model = T5ForConditionalGeneration.from_pretrained(model_name)
	model.eval()

	def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
	model.eval()
	device = next(model.parameters()).device # get device (cpu or cuda)
	input_text = "summarize: " + text.strip()
	inputs = tokenizer(
	input_text,
	return_tensors="pt",
	truncation=True,
	padding="max_length",
	max_length=max_input_length
	).to(device) # move inputs to device

	with torch.no_grad():
	summary_ids = model.generate(
	input_ids=inputs["input_ids"],
	attention_mask=inputs["attention_mask"],
	max_length=max_output_length,
	num_beams=4,
	early_stopping=True
	)

	summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
	return summary


	# Example
	text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
	print("Summary:", summarize(text))
	```
	## 📁 Repository Structure
	```
	.
	├── model/ # Fine-tuned model files (pytorch_model.bin, config.json)
	├── tokenizer/ # Tokenizer config and vocab
	├── training_script.py # Training code
	├── product_descriptions.csv # Source dataset
	├── utils.py # Preprocessing & summarization utilities
	├── README.md # Model card
	```
	## 🤝 Contributing
	Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.