Instructions to use ARISCOT/Digital_Literacy_Fact_Checker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ARISCOT/Digital_Literacy_Fact_Checker with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="ARISCOT/Digital_Literacy_Fact_Checker")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ARISCOT/Digital_Literacy_Fact_Checker", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -30,9 +30,42 @@ widget:
|
|
| 30 |
- text: "Scientists have discovered a planet made entirely of diamond."
|
| 31 |
example_title: "Science Claim"
|
| 32 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
# Digital Literacy & Fact-Checker AI 🌍
|
| 34 |
|
| 35 |
This AI helps verify news claims globally, with a specialized focus on digital literacy and misinformation trends in West Africa."
|
| 36 |
|
| 37 |
## How it Works
|
| 38 |
-
This model uses the RoBERTa architecture to classify news claims into four categories: reliable, misleading, false, or unverified.
|
|
|
|
|
|
| 30 |
- text: "Scientists have discovered a planet made entirely of diamond."
|
| 31 |
example_title: "Science Claim"
|
| 32 |
---
|
| 33 |
+
|
| 34 |
+
# 1. Load the different "Subject Experts"
|
| 35 |
+
# We take a sample of 5,000 from each to keep the model balanced
|
| 36 |
+
global_news = load_dataset("jason1966/algozee_fake-news", split='train[:5000]')
|
| 37 |
+
politics = load_dataset("ucsbnlp/liar", split='train[:5000]')
|
| 38 |
+
science_health = load_dataset("Intel/misinformation-guard", split='train[:5000]')
|
| 39 |
+
|
| 40 |
+
# 2. Label Harmonization
|
| 41 |
+
# Different datasets use different numbers for "False".
|
| 42 |
+
# We force them all to use: 0 for False, 1 for True.
|
| 43 |
+
def clean_labels(example):
|
| 44 |
+
# Example logic: if the label is 'fake' or 0, it stays 0
|
| 45 |
+
if str(example['label']).lower() in ['fake', 'false', '0']:
|
| 46 |
+
example['label'] = 0
|
| 47 |
+
else:
|
| 48 |
+
example['label'] = 1
|
| 49 |
+
return example
|
| 50 |
+
|
| 51 |
+
# Apply the cleaning to all datasets
|
| 52 |
+
global_news = global_news.map(clean_labels)
|
| 53 |
+
politics = politics.map(clean_labels)
|
| 54 |
+
science_health = science_health.map(clean_labels)
|
| 55 |
+
|
| 56 |
+
# 3. Create the "Super Dataset"
|
| 57 |
+
universal_data = concatenate_datasets([global_news, politics, science_health])
|
| 58 |
+
|
| 59 |
+
# 4. Shuffle so the model learns all subjects at the same time
|
| 60 |
+
universal_data = universal_data.shuffle(seed=42)
|
| 61 |
+
|
| 62 |
+
print(f"Universal model is ready to train on {len(universal_data)} claims across all categories!")
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
# Digital Literacy & Fact-Checker AI 🌍
|
| 66 |
|
| 67 |
This AI helps verify news claims globally, with a specialized focus on digital literacy and misinformation trends in West Africa."
|
| 68 |
|
| 69 |
## How it Works
|
| 70 |
+
This model uses the RoBERTa architecture to classify news claims into four categories: reliable, misleading, false, or unverified.
|
| 71 |
+
from datasets import load_dataset, concatenate_datasets, DatasetDict
|