๐ฎ๐ณ IndiSum-AI: Indian News Summarizer
IndiSum-AI is a fine-tuned PRIMERA (LED-based) model optimized for abstractive summarization of the Indian news ecosystem. It is specifically trained to handle long-form articles related to Indian finance, technology, space missions (ISRO), and government policy.
๐ Model Description
- Developed by: Mohd Musheer (Takshashila Mahavidyalaya, Amravati)
- Model type: LED/PRIMERA (Long-form Encoder-Decoder)
- Finetuned from:
allenai/PRIMERA - Language: English
- Context Window: 1024 tokens
๐ Evaluation Results
Evaluated on a test set of Indian news articles (2025-2026 contexts):
| Metric | Score |
|---|---|
| ROUGE-1 | 71.43 |
| ROUGE-2 | 46.15 |
| ROUGE-L | 68.25 |
| BERTScore (F1) | 0.93 |
๐ ๏ธ How to Use
You can use this model directly with the Hugging Face pipeline or AutoModelForSeq2SeqLM.
Simple Pipeline Usage:
from transformers import pipeline
summarizer = pipeline("summarization", model="mohd-musheer/News-Summarizer-AI")
text = "PASTE_YOUR_LONG_NEWS_ARTICLE_HERE"
print(summarizer(text, max_length=128, min_length=30, do_sample=False))
Manual Usage (Best for Performance):
Python
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("mohd-musheer/News-Summarizer-AI")
model = AutoModelForSeq2SeqLM.from_pretrained("mohd-musheer/News-Summarizer-AI")
article = "..."
inputs = tokenizer(article, truncation=True, max_length=1024, return_tensors="pt")
# Global attention on the first token is recommended for LED/PRIMERA
global_attention_mask = torch.zeros_like(inputs["input_ids"])
global_attention_mask[:, 0] = 1
summary_ids = model.generate(
inputs["input_ids"],
global_attention_mask=global_attention_mask,
max_length=128,
num_beams=4
)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
- Downloads last month
- 65
Model tree for mohd-musheer/News-Summarizer-AI
Base model
allenai/PRIMERADataset used to train mohd-musheer/News-Summarizer-AI
Evaluation results
- rouge1 on cleaned-news-summ-no-outliersself-reported71.430
- bertscore on cleaned-news-summ-no-outliersself-reported0.930