ARISCOT
/

Digital_Literacy_Fact_Checker

Text Classification

Model card Files Files and versions

ARISCOT commited on 1 day ago

Commit

512697c

·

verified ·

1 Parent(s): ae90040

Update README.md

Files changed (1) hide show

README.md +34 -1

README.md CHANGED Viewed

@@ -30,9 +30,42 @@ widget:
 - text: "Scientists have discovered a planet made entirely of diamond."
   example_title: "Science Claim"
 ---
 # Digital Literacy & Fact-Checker AI 🌍
 This AI helps verify news claims globally, with a specialized focus on digital literacy and misinformation trends in West Africa."
 ## How it Works
-This model uses the RoBERTa architecture to classify news claims into four categories: reliable, misleading, false, or unverified.

 - text: "Scientists have discovered a planet made entirely of diamond."
   example_title: "Science Claim"
 ---
+# 1. Load the different "Subject Experts"
+# We take a sample of 5,000 from each to keep the model balanced
+global_news = load_dataset("jason1966/algozee_fake-news", split='train[:5000]')
+politics = load_dataset("ucsbnlp/liar", split='train[:5000]')
+science_health = load_dataset("Intel/misinformation-guard", split='train[:5000]')
+# 2. Label Harmonization
+# Different datasets use different numbers for "False".
+# We force them all to use: 0 for False, 1 for True.
+def clean_labels(example):
+    # Example logic: if the label is 'fake' or 0, it stays 0
+    if str(example['label']).lower() in ['fake', 'false', '0']:
+        example['label'] = 0
+    else:
+        example['label'] = 1
+    return example
+# Apply the cleaning to all datasets
+global_news = global_news.map(clean_labels)
+politics = politics.map(clean_labels)
+science_health = science_health.map(clean_labels)
+# 3. Create the "Super Dataset"
+universal_data = concatenate_datasets([global_news, politics, science_health])
+# 4. Shuffle so the model learns all subjects at the same time
+universal_data = universal_data.shuffle(seed=42)
+print(f"Universal model is ready to train on {len(universal_data)} claims across all categories!")
+---
 # Digital Literacy & Fact-Checker AI 🌍
 This AI helps verify news claims globally, with a specialized focus on digital literacy and misinformation trends in West Africa."
 ## How it Works
+This model uses the RoBERTa architecture to classify news claims into four categories: reliable, misleading, false, or unverified.
+from datasets import load_dataset, concatenate_datasets, DatasetDict