Spaces:

Adarsh921
/

hindi_summarizer

Sleeping

App Files Files Community

Adarsh921 commited on Jul 5, 2025

Commit

ee06b72

verified ·

1 Parent(s): 704d874

Upload 3 files

Browse files

gradio app uploading

Files changed (3) hide show

README.md +101 -10
app.py +39 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -1,13 +1,104 @@
 ---
-title: Hindi Summarizer
-emoji: 🔥
-colorFrom: yellow
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.35.0
-app_file: app.py
-pinned: false
-short_description: hindi article summary having technical words in english
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 📰 Hindi News Summarizer using IndicBART
+This is a fine-tuned version of [`ai4bharat/IndicBART`](https://huggingface.co/ai4bharat/IndicBART) trained on the **Hindi ILSUM 2024 dataset** for abstractive summarization of Hindi news articles.
+---
+## ✨ Model Details
+- **Model**: `ai4bharat/IndicBART` (multilingual BART)
+- **Fine-tuned on**: Hindi subset of ILSUM 2024
+- **Task**: Abstractive summarization
+- **Language**: Hindi (`hi`)
+- **Max input length**: 512 tokens
+- **Max summary length**: 128 tokens
+---
+## 🧾 Dataset
+- **Name**: ILSUM 2024 (Indic Language Summarization Dataset)
+- **Source**: Hindi news articles with corresponding abstractive summaries
+- **Size**:
+  - Training samples: ~11K
+  - Validation samples: ~1.6K
+---
+## 🚀 How to Use
+### 🐍 In Python (Transformers)
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+model = AutoModelForSeq2SeqLM.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
+tokenizer = AutoTokenizer.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
+text = "हिंदुस्तान में मानसून ने दस्तक दे दी है और कई इलाकों में भारी बारिश हो रही है..."
+inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
+summary_ids = model.generate(
+    inputs["input_ids"],
+    max_length=128,
+    num_beams=4,
+    no_repeat_ngram_size=3,
+    early_stopping=True
+)
+summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
+print(summary)
+```
 ---
+## 💡 Example
+**Input Article**:
+```
+भारतीय क्रिकेट टीम ने इंग्लैंड के खिलाफ रोमांचक मुकाबले में 5 रन से जीत दर्ज की है...
+```
+**Generated Summary**:
+```
+भारत ने इंग्लैंड को 5 रन से हराया।
+```
+---
+## 📊 Evaluation
+| Metric   | Score (approx.) |
+|----------|-----------------|
+| ROUGE-1  | 0.50            |
+| ROUGE-2  | 0.21           |
+| ROUGE-L  | 0.50           |
+Model trained with:
+- Batch size: 8
+- Epochs: 6
+- Optimizer: AdamW
+- Learning rate: 3e-5
+---
+## 🌐 Live Demo
+Try the model in a live Gradio interface:
+👉 [Hindi Summarizer Space](https://huggingface.co/spaces/Adarsh921/indicbart-hindi-summarizer)
 ---
+## 🧠 Author
+Developed by **Adarsh Bhardwaj**
+[Hugging Face Profile](https://huggingface.co/Adarsh921)
+---
+## 📌 License
+MIT License

app.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import gradio as gr
+import torch
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+# Load from Hugging Face Hub
+tokenizer = AutoTokenizer.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
+model = AutoModelForSeq2SeqLM.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = model.to(device)
+# Inference function
+def generate_summary(text):
+    inputs = tokenizer(
+        text,
+        return_tensors="pt",
+        max_length=512,
+        truncation=True,
+        padding="max_length"
+    )
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    summary_ids = model.generate(
+        inputs["input_ids"],
+        num_beams=4,
+        max_length = 128
+        min_length=30,
+        no_repeat_ngram_size=3,
+        early_stopping=True
+    )
+    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)
+# Gradio UI
+gr.Interface(
+    fn=generate_summary,
+    inputs=gr.Textbox(lines=10, label="Paste Hindi Article"),
+    outputs=gr.Textbox(label="Generated Summary"),
+    title="Hindi Article Summarizer",
+    description="Summarizer fine-tuned on ILSUM 2024 using IndicBART"
+).launch(share=True)

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+transformers
+torch
+gradio