Adarsh921 commited on
Commit
ee06b72
·
verified ·
1 Parent(s): 704d874

Upload 3 files

Browse files

gradio app uploading

Files changed (3) hide show
  1. README.md +101 -10
  2. app.py +39 -0
  3. requirements.txt +3 -0
README.md CHANGED
@@ -1,13 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Hindi Summarizer
3
- emoji: 🔥
4
- colorFrom: yellow
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.35.0
8
- app_file: app.py
9
- pinned: false
10
- short_description: hindi article summary having technical words in english
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # 📰 Hindi News Summarizer using IndicBART
3
+
4
+ This is a fine-tuned version of [`ai4bharat/IndicBART`](https://huggingface.co/ai4bharat/IndicBART) trained on the **Hindi ILSUM 2024 dataset** for abstractive summarization of Hindi news articles.
5
+
6
+ ---
7
+
8
+ ## ✨ Model Details
9
+
10
+ - **Model**: `ai4bharat/IndicBART` (multilingual BART)
11
+ - **Fine-tuned on**: Hindi subset of ILSUM 2024
12
+ - **Task**: Abstractive summarization
13
+ - **Language**: Hindi (`hi`)
14
+ - **Max input length**: 512 tokens
15
+ - **Max summary length**: 128 tokens
16
+
17
+ ---
18
+
19
+ ## 🧾 Dataset
20
+
21
+ - **Name**: ILSUM 2024 (Indic Language Summarization Dataset)
22
+ - **Source**: Hindi news articles with corresponding abstractive summaries
23
+ - **Size**:
24
+ - Training samples: ~11K
25
+ - Validation samples: ~1.6K
26
+
27
+ ---
28
+
29
+ ## 🚀 How to Use
30
+
31
+ ### 🐍 In Python (Transformers)
32
+
33
+ ```python
34
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
35
+
36
+ model = AutoModelForSeq2SeqLM.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
37
+ tokenizer = AutoTokenizer.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
38
+
39
+ text = "हिंदुस्तान में मानसून ने दस्तक दे दी है और कई इलाकों में भारी बारिश हो रही है..."
40
+ inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
41
+
42
+ summary_ids = model.generate(
43
+ inputs["input_ids"],
44
+ max_length=128,
45
+ num_beams=4,
46
+ no_repeat_ngram_size=3,
47
+ early_stopping=True
48
+ )
49
+
50
+ summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
51
+ print(summary)
52
+ ```
53
+
54
  ---
55
+
56
+ ## 💡 Example
57
+
58
+ **Input Article**:
59
+
60
+ ```
61
+ भारतीय क्रिकेट टीम ने इंग्लैंड के खिलाफ रोमांचक मुकाबले में 5 रन से जीत दर्ज की है...
62
+ ```
63
+
64
+ **Generated Summary**:
65
+
66
+ ```
67
+ भारत ने इंग्लैंड को 5 रन से हराया।
68
+ ```
69
+
70
+ ---
71
+
72
+ ## 📊 Evaluation
73
+
74
+ | Metric | Score (approx.) |
75
+ |----------|-----------------|
76
+ | ROUGE-1 | 0.50 |
77
+ | ROUGE-2 | 0.21 |
78
+ | ROUGE-L | 0.50 |
79
+
80
+ Model trained with:
81
+ - Batch size: 8
82
+ - Epochs: 6
83
+ - Optimizer: AdamW
84
+ - Learning rate: 3e-5
85
+
86
+ ---
87
+
88
+ ## 🌐 Live Demo
89
+
90
+ Try the model in a live Gradio interface:
91
+ 👉 [Hindi Summarizer Space](https://huggingface.co/spaces/Adarsh921/indicbart-hindi-summarizer)
92
+
93
  ---
94
 
95
+ ## 🧠 Author
96
+
97
+ Developed by **Adarsh Bhardwaj**
98
+ [Hugging Face Profile](https://huggingface.co/Adarsh921)
99
+
100
+ ---
101
+
102
+ ## 📌 License
103
+
104
+ MIT License
app.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
4
+
5
+ # Load from Hugging Face Hub
6
+ tokenizer = AutoTokenizer.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
7
+ model = AutoModelForSeq2SeqLM.from_pretrained("Adarsh921/indicbart-hindi-summarizer")
8
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
9
+ model = model.to(device)
10
+
11
+ # Inference function
12
+ def generate_summary(text):
13
+ inputs = tokenizer(
14
+ text,
15
+ return_tensors="pt",
16
+ max_length=512,
17
+ truncation=True,
18
+ padding="max_length"
19
+ )
20
+ inputs = {k: v.to(device) for k, v in inputs.items()}
21
+
22
+ summary_ids = model.generate(
23
+ inputs["input_ids"],
24
+ num_beams=4,
25
+ max_length = 128
26
+ min_length=30,
27
+ no_repeat_ngram_size=3,
28
+ early_stopping=True
29
+ )
30
+ return tokenizer.decode(summary_ids[0], skip_special_tokens=True)
31
+
32
+ # Gradio UI
33
+ gr.Interface(
34
+ fn=generate_summary,
35
+ inputs=gr.Textbox(lines=10, label="Paste Hindi Article"),
36
+ outputs=gr.Textbox(label="Generated Summary"),
37
+ title="Hindi Article Summarizer",
38
+ description="Summarizer fine-tuned on ILSUM 2024 using IndicBART"
39
+ ).launch(share=True)
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ transformers
2
+ torch
3
+ gradio