theonegareth commited on
Commit
45fd788
·
1 Parent(s): af7b60b

Clarify that model detects hoax-style writing, not factual accuracy

Browse files
Files changed (3) hide show
  1. README.md +4 -3
  2. app.py +1 -1
  3. modelcard.md +3 -2
README.md CHANGED
@@ -14,7 +14,7 @@ tags: ["nlp", "text-classification", "indonesian", "hoax-detection", "machine-le
14
 
15
  # IndoHoaxDetector
16
 
17
- A machine learning model for detecting hoax news articles in Indonesian language. This project uses a logistic regression classifier trained on a dataset of Indonesian news to identify potentially misleading or false information.
18
 
19
  ## Features
20
 
@@ -65,10 +65,11 @@ print(prediction) # 0 for legitimate, 1 for hoax
65
 
66
  ## Limitations
67
 
68
- - Trained specifically on Indonesian news
 
69
  - May not perform well on other languages or domains
70
  - Accuracy depends on the quality and representativeness of training data
71
- - False positives/negatives possible
72
 
73
  ## Contributing
74
 
 
14
 
15
  # IndoHoaxDetector
16
 
17
+ A machine learning model for detecting hoax-style news articles in Indonesian language. This project uses a logistic regression classifier trained on linguistic features of Indonesian news to identify articles written in a style typical of hoaxes or fake news, **not to verify factual accuracy**. It analyzes writing patterns, sensationalism, and other stylistic indicators rather than checking the truthfulness of the content.
18
 
19
  ## Features
20
 
 
65
 
66
  ## Limitations
67
 
68
+ - **Stylistic Analysis Only**: This model detects hoax-like writing style, not factual accuracy. A legitimate article could be flagged as hoax if written sensationally, and vice versa.
69
+ - Trained specifically on Indonesian news linguistic patterns
70
  - May not perform well on other languages or domains
71
  - Accuracy depends on the quality and representativeness of training data
72
+ - False positives/negatives possible due to stylistic variations
73
 
74
  ## Contributing
75
 
app.py CHANGED
@@ -60,7 +60,7 @@ demo = gr.Interface(
60
  ),
61
  outputs=gr.Markdown(label="Detection Result"),
62
  title="IndoHoaxDetector",
63
- description="Detect hoax news in Indonesian language using machine learning. Enter news text to check if it's likely legitimate or a hoax.",
64
  examples=[
65
  ["Presiden mengumumkan program bantuan sosial untuk masyarakat miskin di seluruh Indonesia."],
66
  ["Ditemukan cara ampuh menghilangkan stres hanya dengan minum air putih 2 liter sehari."],
 
60
  ),
61
  outputs=gr.Markdown(label="Detection Result"),
62
  title="IndoHoaxDetector",
63
+ description="**Stylistic Analysis Tool**: Detects if Indonesian news text is written in a hoax-like style using machine learning. This analyzes writing patterns and sensationalism, **not factual accuracy**. Results indicate writing style similarity to known hoaxes, not truth verification.",
64
  examples=[
65
  ["Presiden mengumumkan program bantuan sosial untuk masyarakat miskin di seluruh Indonesia."],
66
  ["Ditemukan cara ampuh menghilangkan stres hanya dengan minum air putih 2 liter sehari."],
modelcard.md CHANGED
@@ -3,7 +3,7 @@
3
  ## Model Details
4
 
5
  ### Model Description
6
- IndoHoaxDetector is a binary classification model designed to detect hoax news articles in the Indonesian language. It uses logistic regression trained on a dataset of Indonesian news to classify text as either legitimate or hoax.
7
 
8
  - **Developed by**: Gareth Aurelius Harrison
9
  - **Model type**: Logistic Regression (scikit-learn)
@@ -18,7 +18,7 @@ IndoHoaxDetector is a binary classification model designed to detect hoax news a
18
  ## Uses
19
 
20
  ### Direct Use
21
- This model can be used to analyze Indonesian news articles and determine if they are likely to be hoaxes. It is intended for educational, research, and journalistic purposes to help identify potentially misleading information.
22
 
23
  ### Downstream Use
24
  - News verification tools
@@ -42,6 +42,7 @@ Users should be aware that this model:
42
  - Requires human verification for critical applications
43
 
44
  ### Known Limitations
 
45
  - **Data Bias**: The model is trained on a limited dataset; performance may vary with different topics or writing styles
46
  - **Language Specificity**: Only works for Indonesian text
47
  - **Temporal Limitations**: News patterns change over time; the model may become less accurate with newer data
 
3
  ## Model Details
4
 
5
  ### Model Description
6
+ IndoHoaxDetector is a binary classification model designed to detect hoax-style news articles in the Indonesian language. It uses logistic regression trained on linguistic features of Indonesian news to classify text as either legitimate or hoax-like writing. **This model analyzes writing style and patterns, not factual accuracy or truthfulness of the content.**
7
 
8
  - **Developed by**: Gareth Aurelius Harrison
9
  - **Model type**: Logistic Regression (scikit-learn)
 
18
  ## Uses
19
 
20
  ### Direct Use
21
+ This model can be used to analyze Indonesian news articles and determine if they are written in a hoax-like style. It identifies linguistic patterns typical of fake news but does **not verify factual accuracy**. It is intended for educational, research, and journalistic purposes to help identify potentially sensational or misleading writing styles.
22
 
23
  ### Downstream Use
24
  - News verification tools
 
42
  - Requires human verification for critical applications
43
 
44
  ### Known Limitations
45
+ - **Stylistic vs Factual Analysis**: This model detects writing style typical of hoaxes, not factual inaccuracies. Legitimate news written sensationally may be flagged as hoax, and factual hoaxes written professionally may be missed.
46
  - **Data Bias**: The model is trained on a limited dataset; performance may vary with different topics or writing styles
47
  - **Language Specificity**: Only works for Indonesian text
48
  - **Temporal Limitations**: News patterns change over time; the model may become less accurate with newer data