Model Card for Model ID
Fake News Detection Dashboard. The dashboard allows users to detect disinformation using six standard benchmark datasets.The front end, developed with Streamlit, provides an interactive interface for uploading files in CSV, PDF, or DOCX formats, and visualizes results using bar charts and word clouds. Its modular architecture separates data ingestion, preprocessing, model inference, and visualization, ensuring scalability and maintainability. The dashboard has been applied to multiple datasets, including EUvsDisinfo, EUvsISOT, EUvsIGF, FA-KES, George McIntire, and ISOT, enabling large-scale predictions, cross-dataset generalizability assessment, propagation analysis, and exploration of textual patterns contributing to disinformation.
Model Details
Model Description
Inference uses cached pretrained models for efficiency. For ML pipelines, text is vectorised using stored TF-IDF unigram and bigram features before PA predic- tion. For DL pipelines, text is tokenised, padded, and passed through BiLSTM networks to generate probabilistic outputs, normalised to consistent binary labels (Fake/Disinformation or True). Results are displayed in real time with dynamic visualisations, including keyword-based explanations via TF-IDF for ML models and word clouds for DL models, ensuring transparency and interpretability within a unified modular framework.
- Developed by: Sadam Hussain
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: Sadam Hussain
- Model type: Passive Aggressive and BiLSTM
- Language(s) (NLP): (NLP): English (en)
- License: MIT
- Finetuned from model [optional]: [More Information Needed]
Model Sources [optional]
- Repository: https://dashboard-fake-news-detection-xw2wv9dmgfsywawzpvn9wi.streamlit.app/
- Paper [optional]: https://doi.org/10.5281/zenodo.18158666
- Demo [optional]: https://dashboard-fake-news-detection-xw2wv9dmgfsywawzpvn9wi.streamlit.app/
Uses
Direct Use
The dashboard allows users to detect disinformation and fake news in English-language news articles. Users can input raw text, CSV, PDF, or DOCX files. Predictions are generated in real time using ML/DL pipelines and visualised through keyword explanations (TF-IDF) and word clouds.
[More Information Needed]
Downstream Use [optional]
Can be integrated into larger news monitoring systems or fact-checking pipelines.
Pretrained ML/DL/Transformer models can be fine-tuned for other datasets or domains.
[More Information Needed]
Out-of-Scope Use
Not suitable for languages other than English without retraining.
Should not be used to make legal, financial, or medical decisions without human oversight.
May produce incorrect predictions on highly domain-specific or adversarial content.
[More Information Needed]
Bias, Risks, and Limitations
- Models may reflect biases present in the datasets (EUvsDisinfo, ISOT, FA-KES, etc.).
- Model struggles to produce accurate predictions for short texts, particularly those containing 5–10 sentences or fewer than 100 words
[More Information Needed]
Recommendations
Users should review model outputs critically.
Combine ML/DL predictions with human fact-checking for high-stakes decisions.
Consider retraining or finetuning if applying to new domains or languages.
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Run each file DLtrain_models.py and MLtrain_models.py with python3 name of file.
[More Information Needed]
Training Details
Running Both File Available on Github repository \href{https://github.com/afridisadam1-alt/Dashboard-Fake-News-Detection/blob/main/DLtrain_models.py}{DLtrain-models.py} and \href{https://github.com/afridisadam1-alt/Dashboard-Fake-News-Detection/blob/main/MLtrain_models.py}{MLtrain-models.py}. After training the models for both ML and DL it will generate the pretrained models with vectorizers.
Training Data
Datasets: EUvsISOT, EUvsIGF, ISOT, FA-KES, George McIntire
Content: English-language news articles labeled as Fake/Disinformation or True
Preprocessing: Lowercasing, punctuation removal, tokenization; TF-IDF features extracted for ML models; sequences padded and tokenized for BiLSTM/Transformer models
[More Information Needed]
Training Procedure
Preprocessing [optional]
ML: TF-IDF unigram + bigram vectorization
DL: Tokenization, padding, batching
[More Information Needed]
Training Hyperparameters
ML: Standard scikit-learn implementations (Passive-Aggressive)
DL: BiLSTM, LSTM; batch size 32–64, learning rate 1e-3, early stopping
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
Held-out splits from all six benchmark datasets
[More Information Needed]
Factors
[More Information Needed]
Metrics
Accuracy, Precision, Recall, F1-score
Confusion matrices for interpretability
[More Information Needed]
Results
Passive-Aggressive and BiLSTM models achieved F1-scores > 0.90 on most datasets [More Information Needed]
Summary
The system reliably detects fake news across multiple datasets, providing both interpretable visual outputs and real-time predictions.
Model Examination [optional]
Keyword-based explanations using TF-IDF (ML models)
Word clouds for DL models
Confusion matrices available for multi-class evaluations
LDA Topics and Themes based
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA GPUs (V100/3090) for DL/Transformer models
CPU for ML pipelines
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
- ML: Passive-Aggressive
- DL: BiLSTM [More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
NVIDIA GPUs (V100/3090) for DL/Transformer models
CPU for ML pipelines [More Information Needed]
Software
Python 3.10
PyTorch, TensorFlow, scikit-learn, Transformers, Streamlit, NumPy, Pandas [More Information Needed]
Citation [optional]
https://zenodo.org/records/18158666
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
Dashboard supports CSV, PDF, DOCX file input
Visual analytics: label distributions, word clouds, keyword explanations [More Information Needed]
Model Card Authors [optional]
Sadam Hussain
[More Information Needed]
Model Card Contact
shuss007@gold.ac.uk [More Information Needed]