File size: 5,767 Bytes

---
license: mit
language:
- en
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
tags:
- multilabel
- framing
- entman
- media-bias
- news
- zero-shot
- bart-large-mnli
metrics:
- accuracy
- f1
- precision
- recall
---

# BERT Framing Classifier (Entman Multilabel) v1

This model is a multilabel classifier based on `bert-base-uncased`, fine-tuned to identify **framing functions** in news articles, inspired by Robert Entman's framing theory. The labels correspond to the four framing functions:

- **Define problems**
- **Diagnose causes**
- **Make moral judgments**
- **Suggest remedies**

## 🔍 Use Case

This model is designed for media studies researchers, journalists, or analysts studying **media framing**, **bias**, and **narrative patterns** in English-language news coverage.

It is especially useful for:
- News framing analysis in media studies.
- Detecting narrative intent in political discourse.
- Multilabel classification of complex textual claims.

Each label is treated as an independent binary classification task (multi-label classification).

## 🧠 Model Details

- Base model: `bert-base-uncased`
- Framework: 🤗 Transformers with PyTorch
- Loss Function: `BCEWithLogitsLoss` with class weights
- Label imbalance handled using positive weights and stratified multi-label split

## 📊 Metrics

Evaluated on a stratified test set using:

- Accuracy
- F1-score (macro)
- Precision (macro)
- Recall (macro)
- ROC-AUC per class

Thresholds for prediction were tuned per label for optimal F1-score.

### 📊 Objective

This experiment aimed to optimize the performance of a BERT-based sequence classification model for framing analysis using the Optuna hyperparameter tuning framework. The goal was to maximize the macro F1-score, which is a balanced metric for multi-label classification involving class imbalance.

### ⚙️ Hyperparameters Tuned

- `learning_rate`: float, explored between ~1e-5 to ~5e-5
- `weight_decay`: float, various values tested from ~0.02 to ~0.25
- `num_train_epochs`: integer, values tried between 2 and 4

## 🏆 Best Trial Summary

- **F1 Macro**: **0.8546**
- **Accuracy**: 0.5846
- **Precision Macro**: 0.8634
- **Recall Macro**: 0.8486
- **Best Hyperparameters**:
  - `learning_rate`: **4.62e-5**
  - `weight_decay`: **0.2275**
  - `num_train_epochs`: **4**

## 📈 Best Trial Training Metrics

| Epoch | Training Loss | Validation Loss | Accuracy | F1 Macro | Precision Macro | Recall Macro |
|-------|----------------|------------------|----------|----------|------------------|----------------|
| 1     | 0.4155         | 0.4499           | 0.3466   | 0.6998   | 0.8443           | 0.6265         |
| 2     | 0.3613         | 0.3414           | 0.4764   | 0.7862   | 0.8725           | 0.7266         |
| 3     | 0.2011         | 0.3179           | 0.5649   | 0.8495   | 0.8489           | 0.8506         |
| 4     | 0.1416         | 0.3508           | 0.5846   | **0.8546** | 0.8634         | 0.8486         |

![ROC Curve](https://cdn-uploads.huggingface.co/production/uploads/67aa9afb0b5d8d7d5e4e207a/jEq26g_ksRWyKLK2Yxsur.png)

## 📝 Notes

- All models started from the `bert-base-uncased` checkpoint.
- Classification head weights were randomly initialized (`classifier.weight`, `classifier.bias`).
- Full training was conducted for each trial; early stopping was **not** used.

## 🧪 How to Use

```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("nurdyansa/bert-framing-entman-multilabel-v1")

label_cols = ["define_problem", "diagnose_cause", "moral_judgment", "suggest_remedy"]

def predict_framing(text, thresholds=None):
    model.eval()
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.sigmoid(outputs.logits).squeeze()
        preds = (probs > torch.tensor(thresholds or [0.5]*4)).int().tolist()
    return {label_cols[i]: bool(preds[i]) for i in range(len(label_cols))}

# Example
text = "The government failed to address the root cause of the crisis."
print(predict_framing(text))
```

## 🔧 Configuration

```python
repo_name = "nurdyansa/bert-framing-entman-multilabel-v1"
```

## 📁 Dataset

Balanced dataset of English-language news articles annotated with 4 Entman-style framing labels:
- Define Problem
- Diagnose Cause
- Moral Judgment
- Suggest Remedy

## 🚀 Training Details

- Dataset size: 4,000+ english news articles
- Optimized using Optuna (10 trials)
- Training framework: Hugging Face Transformers (PyTorch)
- Evaluation strategy: Per epoch
- Final model selected based on best macro F1-score

---

Model by [nurdyansa](https://huggingface.co/nurdyansa)

## 📚 Citation

If you use this model in your research or application, please cite it as:

```bibtex
@misc{nurdyansa_2025,
	author       = { Nurdyansa },
	title        = { bert-framing-entman-multilabel-v1 (Revision 057747b) },
	year         = 2025,
	url          = { https://huggingface.co/nurdyansa/bert-framing-entman-multilabel-v1 },
	doi          = { 10.57967/hf/5392 },
	publisher    = { Hugging Face }
}
```

## 🤝 Contributing

I'm very welcome to invite researchers and practitioners to collaborate in enhancing this model’s precision. Please contribute by:

- Providing more annotated data.
- Improving label consistency or adding nuance.
- Suggesting improvements to model architecture or training methods.

If you are interested in collaborating, sharing insights, or further developing this model, feel free to reach out:

📧 Email: nurdyansa@gmail.com