Model Card for Model ID

Fake News Detection Dashboard. The dashboard allows users to detect disinformation using six standard benchmark datasets.The front end, developed with Streamlit, provides an interactive interface for uploading files in CSV, PDF, or DOCX formats, and visualizes results using bar charts and word clouds. Its modular architecture separates data ingestion, preprocessing, model inference, and visualization, ensuring scalability and maintainability. The dashboard has been applied to multiple datasets, including EUvsDisinfo, EUvsISOT, EUvsIGF, FA-KES, George McIntire, and ISOT, enabling large-scale predictions, cross-dataset generalizability assessment, propagation analysis, and exploration of textual patterns contributing to disinformation.

Model Details

Model Description

Inference uses cached pretrained models for efficiency. For ML pipelines, text is vectorised using stored TF-IDF unigram and bigram features before PA predic- tion. For DL pipelines, text is tokenised, padded, and passed through BiLSTM networks to generate probabilistic outputs, normalised to consistent binary labels (Fake/Disinformation or True). Results are displayed in real time with dynamic visualisations, including keyword-based explanations via TF-IDF for ML models and word clouds for DL models, ensuring transparency and interpretability within a unified modular framework.

  • Developed by: Sadam Hussain
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: Sadam Hussain
  • Model type: Passive Aggressive and BiLSTM
  • Language(s) (NLP): (NLP): English (en)
  • License: MIT
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

Uses

Direct Use

The dashboard allows users to detect disinformation and fake news in English-language news articles. Users can input raw text, CSV, PDF, or DOCX files. Predictions are generated in real time using ML/DL pipelines and visualised through keyword explanations (TF-IDF) and word clouds.

[More Information Needed]

Downstream Use [optional]

  1. Can be integrated into larger news monitoring systems or fact-checking pipelines.

  2. Pretrained ML/DL/Transformer models can be fine-tuned for other datasets or domains.

[More Information Needed]

Out-of-Scope Use

  1. Not suitable for languages other than English without retraining.

  2. Should not be used to make legal, financial, or medical decisions without human oversight.

  3. May produce incorrect predictions on highly domain-specific or adversarial content.

[More Information Needed]

Bias, Risks, and Limitations

  1. Models may reflect biases present in the datasets (EUvsDisinfo, ISOT, FA-KES, etc.).
  2. Model struggles to produce accurate predictions for short texts, particularly those containing 5–10 sentences or fewer than 100 words

[More Information Needed]

Recommendations

  1. Users should review model outputs critically.

  2. Combine ML/DL predictions with human fact-checking for high-stakes decisions.

  3. Consider retraining or finetuning if applying to new domains or languages.

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Run each file DLtrain_models.py and MLtrain_models.py with python3 name of file.

[More Information Needed]

Training Details

Running Both File Available on Github repository \href{https://github.com/afridisadam1-alt/Dashboard-Fake-News-Detection/blob/main/DLtrain_models.py}{DLtrain-models.py} and \href{https://github.com/afridisadam1-alt/Dashboard-Fake-News-Detection/blob/main/MLtrain_models.py}{MLtrain-models.py}. After training the models for both ML and DL it will generate the pretrained models with vectorizers.

Training Data

  1. Datasets: EUvsISOT, EUvsIGF, ISOT, FA-KES, George McIntire

  2. Content: English-language news articles labeled as Fake/Disinformation or True

  3. Preprocessing: Lowercasing, punctuation removal, tokenization; TF-IDF features extracted for ML models; sequences padded and tokenized for BiLSTM/Transformer models

[More Information Needed]

Training Procedure

Preprocessing [optional]

  1. ML: TF-IDF unigram + bigram vectorization

  2. DL: Tokenization, padding, batching

[More Information Needed]

Training Hyperparameters

  1. ML: Standard scikit-learn implementations (Passive-Aggressive)

  2. DL: BiLSTM, LSTM; batch size 32–64, learning rate 1e-3, early stopping

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out splits from all six benchmark datasets

[More Information Needed]

Factors

[More Information Needed]

Metrics

  1. Accuracy, Precision, Recall, F1-score

  2. Confusion matrices for interpretability

[More Information Needed]

Results

Passive-Aggressive and BiLSTM models achieved F1-scores > 0.90 on most datasets [More Information Needed]

Summary

The system reliably detects fake news across multiple datasets, providing both interpretable visual outputs and real-time predictions.

Model Examination [optional]

  1. Keyword-based explanations using TF-IDF (ML models)

  2. Word clouds for DL models

  3. Confusion matrices available for multi-class evaluations

  4. LDA Topics and Themes based

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: NVIDIA GPUs (V100/3090) for DL/Transformer models

CPU for ML pipelines

  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

  1. ML: Passive-Aggressive
  2. DL: BiLSTM [More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

  1. NVIDIA GPUs (V100/3090) for DL/Transformer models

  2. CPU for ML pipelines [More Information Needed]

Software

  1. Python 3.10

  2. PyTorch, TensorFlow, scikit-learn, Transformers, Streamlit, NumPy, Pandas [More Information Needed]

Citation [optional]

https://zenodo.org/records/18158666

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

  1. Dashboard supports CSV, PDF, DOCX file input

  2. Visual analytics: label distributions, word clouds, keyword explanations [More Information Needed]

Model Card Authors [optional]

Sadam Hussain

[More Information Needed]

Model Card Contact

shuss007@gold.ac.uk [More Information Needed]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Afridi22/Disinformation-Fake-News-Detection