---
library_name: transformers
tags:
- topic
- multi-sentiment
license: mit
datasets:
- valurank/Topic_Classification
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model:
- distilbert/distilbert-base-uncased
---

# Model Card for Topic Classification Model

A fine-tuned DistilBERT model for multi-class topic classification. This model predicts the most relevant topic label from a predefined set based on input text. It was trained using 🤗 Transformers and PyTorch on a custom dataset derived from academic and news-style corpora.

## Model Details

### Model Description

This model was developed by Daniel (@AfroLogicInsect) to classify text into one of several predefined topics. It builds on the `distilbert-base-uncased` architecture and was fine-tuned for multi-class classification using a softmax output layer.

- **Developed by:** Daniel 🇳🇬 (@AfroLogicInsect)
- **Model type:** DistilBERT-based multi-class sequence classifier
- **Language(s):** English
- **License:** MIT
- **Finetuned from:** distilbert-base-uncased

### Model Sources

- **Repository:** [AfroLogicInsect/topic-model-analysis-model](https://huggingface.co/AfroLogicInsect/topic-model-analysis-model)
- **Paper:** arXiv:1910.09700 (DistilBERT)
- **Demo:** [Coming soon]

## Uses

### Direct Use

- Classify academic or news-style text into topics such as AI, finance, sports, climate, etc.
- Embed in dashboards or content moderation tools for automatic tagging

### Downstream Use

- Can be extended to hierarchical topic classification
- Useful for building recommendation engines or content filters

### Out-of-Scope Use

- Not suitable for sentiment or emotion classification
- May not generalize well to informal or slang-heavy text

## Bias, Risks, and Limitations

- Trained on curated corpora — may reflect biases in source material
- Topics are predefined and static — emerging topics may be misclassified
- Confidence scores are probabilistic, not definitive

### Recommendations

- Use `top_k=5` with `return_all_scores=True` to retrieve multiple topic predictions
- Consider fine-tuning on domain-specific data for improved accuracy

## How to Get Started

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="AfroLogicInsect/topic-model-analysis-model",
    tokenizer="AfroLogicInsect/topic-model-analysis-model",
    return_all_scores=True
)

text = "New AI breakthrough in natural language processing"
results = classifier(text)
top_5 = sorted(results[0], key=lambda x: x['score'], reverse=True)[:5]
for i, res in enumerate(top_5):
    print(f"Top {i+1}: {res['label']} ({res['score']:.3f})")
```

## Training Details

### Dataset

- Custom multi-class topic dataset based on arXiv abstracts and news articles
- Labels include domains like AI, finance, sports, climate, etc.

### Hyperparameters

- Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Evaluation every 200 steps
- Metric: F1 score

### Trainer Setup

Used Hugging Face `Trainer` API with `TrainingArguments` configured for early stopping and best model selection.

## Evaluation

Model achieved strong performance across multiple topic categories. Evaluation metrics include:

- **Accuracy:** ~90.8%
- **F1 Score:** ~0.91
- **Precision:** ~0.89
- **Recall:** ~0.93

## Environmental Impact

- **Hardware:** Google Colab (NVIDIA T4 GPU)
- **Training Time:** ~2.5 hours
- **Carbon Emitted:** ~0.3 kg CO₂eq (estimated via [ML Impact Calculator](https://mlco2.github.io/impact#compute))

## Citation

```bibtex
@misc{afrologicinsect2025topicmodel,
  title = {AfroLogicInsect Topic Classification Model},
  author = {Akan Daniel},
  year = {2025},
  howpublished = {\url{https://huggingface.co/AfroLogicInsect/topic-model-analysis-model}},
}
```

## Contact

- Name: Daniel (@AfroLogicInsect)
- Location: Lagos, Nigeria
- Contact: GitHub / Hugging Face / email (danielamahtoday@gmail.com)