ARISCOT commited on
Commit
26e42cf
·
verified ·
1 Parent(s): f9aab16

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +9 -76
README.md CHANGED
@@ -1,81 +1,14 @@
1
  ---
2
- license: apache-2.0
3
- base_model:
4
- - facebook/roberta-base
5
- - meta-llama/Llama-3.1-8B-Instruct
6
- library_name: transformers.js
7
- datasets:
8
- - Intel/misinformation-guard
9
- - ucsbnlp/liar
10
- - Isotonic/human_assistant_conversation
11
- - fever/fever
12
- - Holmeister/Climate-Fever-TR
13
- - 34data/polyglotfake-real
14
- - talab-ai/pi5-agricultural-iot-32day
15
- - brl-xfact/Eye4AllMulti
16
- - BeIR/scifact
17
- language:
18
- - en
19
- - fr
20
- - es
21
- - ar
22
- - ha
23
- - tw
24
- metrics:
25
- - accuracy
26
- - recall
27
- - precision
28
- pipeline_tag: text-classification
29
  tags:
30
  - fact-checking
31
- - misinformation
32
- - digital-literacy
33
- - fake-news-detection
34
- - nlp
35
- - news
36
- widget:
37
- - text: The government has announced a new tax on all social media users.
38
- example_title: Policy News
39
- - text: Scientists have discovered a planet made entirely of diamond.
40
- example_title: Science Claim
41
- new_version: deepseek-ai/DeepSeek-V4-Pro
42
  ---
43
 
44
- # 1. Load the different "Subject Experts"
45
- # We take a sample of 5,000 from each to keep the model balanced
46
- global_news = load_dataset("jason1966/algozee_fake-news", split='train[:5000]')
47
- politics = load_dataset("ucsbnlp/liar", split='train[:5000]')
48
- science_health = load_dataset("Intel/misinformation-guard", split='train[:5000]')
49
-
50
- # 2. Label Harmonization
51
- # Different datasets use different numbers for "False".
52
- # We force them all to use: 0 for False, 1 for True.
53
- def clean_labels(example):
54
- # Example logic: if the label is 'fake' or 0, it stays 0
55
- if str(example['label']).lower() in ['fake', 'false', '0']:
56
- example['label'] = 0
57
- else:
58
- example['label'] = 1
59
- return example
60
-
61
- # Apply the cleaning to all datasets
62
- global_news = global_news.map(clean_labels)
63
- politics = politics.map(clean_labels)
64
- science_health = science_health.map(clean_labels)
65
-
66
- # 3. Create the "Super Dataset"
67
- universal_data = concatenate_datasets([global_news, politics, science_health])
68
-
69
- # 4. Shuffle so the model learns all subjects at the same time
70
- universal_data = universal_data.shuffle(seed=42)
71
-
72
- print(f"Universal model is ready to train on {len(universal_data)} claims across all categories!")
73
- ---
74
-
75
- # Digital Literacy & Fact-Checker AI 🌍
76
-
77
- This AI helps verify news claims globally, with a specialized focus on digital literacy and misinformation trends in West Africa."
78
-
79
- ## How it Works
80
- This model uses the RoBERTa architecture to classify news claims into four categories: reliable, misleading, false, or unverified.
81
- from datasets import load_dataset, concatenate_datasets, DatasetDict
 
1
  ---
2
+ language: en
3
+ license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
  - fact-checking
6
+ - social-media
7
+ - politics
8
+ - health
9
+ - science
 
 
 
 
 
 
 
10
  ---
11
 
12
+ # Digital Literacy Fact Checker
13
+ This model is designed to classify misinformation across social media,
14
+ politics, health, science, religion, and agriculture.