---
library_name: transformers
tags: [text-classification, llm, huggingface, nlp, news, fine-tuning, gradio]
---

# 📰 NewsSense AI: LLM News Classifier with Web Scraping & Fine-Tuning

A fine-tuned transformer-based model that classifies news articles into five functional categories: Politics, Business, Health, Science, and Climate. The dataset was scraped from NPR using Decodo and processed with BeautifulSoup.

---

## Model Details

### Model Description

This model is fine-tuned using Hugging Face Transformers on a custom dataset of 5,000 news articles scraped directly from [NPR](https://www.npr.org/). The goal is to classify real-world news into practical categories for use in filtering, organizing, and summarizing large-scale news streams.

- **Developed by:** Manan Gulati
- **Model type:** Transformer (text classification)
- **Language(s):** English
- **License:** MIT
- **Fine-tuned from model:** distilbert-base-uncased

### Model Sources

- **Repository:** https://github.com/mgulati3/Fine-Tune
- **Demo:** https://huggingface.co/spaces/mgulati3/news-classifier-ui
- **Model Hub:** https://huggingface.co/mgulati3/news-classifier-model

---

## Uses

### Direct Use
This model can be used to classify any English-language news article or paragraph into one of five categories. It's useful for content filtering, feed curation, and auto-tagging of articles.

### Out-of-Scope Use
- Not suitable for multi-label classification.
- Not recommended for non-news or informal text.
- May not perform well on non-English content.

---

## Bias, Risks, and Limitations

- The model is trained only on NPR articles, which may carry source-specific bias.
- Categories are limited to five; nuanced topics may not be accurately captured.
- Misclassifications may occur for ambiguous or mixed-topic content.

### Recommendations
Use prediction confidence scores to interpret results. Consider human review for sensitive applications.

---

## How to Get Started

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="mgulati3/news-classifier-model")
classifier("NASA's new moon mission will use AI to optimize fuel consumption.")
```

---

## Training Details

### Training Data
Scraped 5,000 articles from NPR using Decodo (with proxy rotation and JS rendering). Articles were cleaned and labeled across five categories using Python and pandas.

### Training Procedure

- Tokenizer: LLaMA-compatible tokenizer
- Preprocessing: Lowercasing, truncation, padding
- Epochs: 4
- Optimizer: AdamW
- Batch size: 16

---

## Evaluation

### Testing Data
20% of the dataset was reserved for testing. Random stratified split was used.

### Metrics
- Accuracy (Train): 85%
- Accuracy (Test): 60%
- Metric: Accuracy (single-label, top-1)

### Results
The model performs well on domain-specific, labeled news content with distinguishable category patterns.

---

## Environmental Impact

- **Hardware Type:** Google Colab GPU (T4)
- **Hours used:** ~2.5
- **Cloud Provider:** Google
- **Compute Region:** US
- **Carbon Emitted:** Estimated ~0.2 kgCO2eq

---

## Technical Specifications

### Model Architecture
DistilBERT architecture fine-tuned for single-label text classification using a softmax output layer over 5 categories.

### Compute Infrastructure
- Google Colab Pro
- Python 3.10
- Hugging Face Transformers 4.x
- PyTorch backend

---

## Citation

**APA:**

Gulati, M. (2025). NewsSense AI: Fine-tuned LLM for News Classification. https://huggingface.co/mgulati3/news-classifier-model

**BibTeX:**

@misc{gulati2025newssense,
  author = {Gulati, Manan},
  title = {NewsSense AI: Fine-tuned LLM for News Classification},
  year = {2025},
  url = {https://huggingface.co/mgulati3/news-classifier-model}
}

---

## Model Card Contact

For questions or collaborations: mgulati3@asu.edu