Commit ·
45fd788
1
Parent(s): af7b60b
Clarify that model detects hoax-style writing, not factual accuracy
Browse files- README.md +4 -3
- app.py +1 -1
- modelcard.md +3 -2
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags: ["nlp", "text-classification", "indonesian", "hoax-detection", "machine-le
|
|
| 14 |
|
| 15 |
# IndoHoaxDetector
|
| 16 |
|
| 17 |
-
A machine learning model for detecting hoax news articles in Indonesian language. This project uses a logistic regression classifier trained on
|
| 18 |
|
| 19 |
## Features
|
| 20 |
|
|
@@ -65,10 +65,11 @@ print(prediction) # 0 for legitimate, 1 for hoax
|
|
| 65 |
|
| 66 |
## Limitations
|
| 67 |
|
| 68 |
-
-
|
|
|
|
| 69 |
- May not perform well on other languages or domains
|
| 70 |
- Accuracy depends on the quality and representativeness of training data
|
| 71 |
-
- False positives/negatives possible
|
| 72 |
|
| 73 |
## Contributing
|
| 74 |
|
|
|
|
| 14 |
|
| 15 |
# IndoHoaxDetector
|
| 16 |
|
| 17 |
+
A machine learning model for detecting hoax-style news articles in Indonesian language. This project uses a logistic regression classifier trained on linguistic features of Indonesian news to identify articles written in a style typical of hoaxes or fake news, **not to verify factual accuracy**. It analyzes writing patterns, sensationalism, and other stylistic indicators rather than checking the truthfulness of the content.
|
| 18 |
|
| 19 |
## Features
|
| 20 |
|
|
|
|
| 65 |
|
| 66 |
## Limitations
|
| 67 |
|
| 68 |
+
- **Stylistic Analysis Only**: This model detects hoax-like writing style, not factual accuracy. A legitimate article could be flagged as hoax if written sensationally, and vice versa.
|
| 69 |
+
- Trained specifically on Indonesian news linguistic patterns
|
| 70 |
- May not perform well on other languages or domains
|
| 71 |
- Accuracy depends on the quality and representativeness of training data
|
| 72 |
+
- False positives/negatives possible due to stylistic variations
|
| 73 |
|
| 74 |
## Contributing
|
| 75 |
|
app.py
CHANGED
|
@@ -60,7 +60,7 @@ demo = gr.Interface(
|
|
| 60 |
),
|
| 61 |
outputs=gr.Markdown(label="Detection Result"),
|
| 62 |
title="IndoHoaxDetector",
|
| 63 |
-
description="
|
| 64 |
examples=[
|
| 65 |
["Presiden mengumumkan program bantuan sosial untuk masyarakat miskin di seluruh Indonesia."],
|
| 66 |
["Ditemukan cara ampuh menghilangkan stres hanya dengan minum air putih 2 liter sehari."],
|
|
|
|
| 60 |
),
|
| 61 |
outputs=gr.Markdown(label="Detection Result"),
|
| 62 |
title="IndoHoaxDetector",
|
| 63 |
+
description="**Stylistic Analysis Tool**: Detects if Indonesian news text is written in a hoax-like style using machine learning. This analyzes writing patterns and sensationalism, **not factual accuracy**. Results indicate writing style similarity to known hoaxes, not truth verification.",
|
| 64 |
examples=[
|
| 65 |
["Presiden mengumumkan program bantuan sosial untuk masyarakat miskin di seluruh Indonesia."],
|
| 66 |
["Ditemukan cara ampuh menghilangkan stres hanya dengan minum air putih 2 liter sehari."],
|
modelcard.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
| 3 |
## Model Details
|
| 4 |
|
| 5 |
### Model Description
|
| 6 |
-
IndoHoaxDetector is a binary classification model designed to detect hoax news articles in the Indonesian language. It uses logistic regression trained on
|
| 7 |
|
| 8 |
- **Developed by**: Gareth Aurelius Harrison
|
| 9 |
- **Model type**: Logistic Regression (scikit-learn)
|
|
@@ -18,7 +18,7 @@ IndoHoaxDetector is a binary classification model designed to detect hoax news a
|
|
| 18 |
## Uses
|
| 19 |
|
| 20 |
### Direct Use
|
| 21 |
-
This model can be used to analyze Indonesian news articles and determine if they are
|
| 22 |
|
| 23 |
### Downstream Use
|
| 24 |
- News verification tools
|
|
@@ -42,6 +42,7 @@ Users should be aware that this model:
|
|
| 42 |
- Requires human verification for critical applications
|
| 43 |
|
| 44 |
### Known Limitations
|
|
|
|
| 45 |
- **Data Bias**: The model is trained on a limited dataset; performance may vary with different topics or writing styles
|
| 46 |
- **Language Specificity**: Only works for Indonesian text
|
| 47 |
- **Temporal Limitations**: News patterns change over time; the model may become less accurate with newer data
|
|
|
|
| 3 |
## Model Details
|
| 4 |
|
| 5 |
### Model Description
|
| 6 |
+
IndoHoaxDetector is a binary classification model designed to detect hoax-style news articles in the Indonesian language. It uses logistic regression trained on linguistic features of Indonesian news to classify text as either legitimate or hoax-like writing. **This model analyzes writing style and patterns, not factual accuracy or truthfulness of the content.**
|
| 7 |
|
| 8 |
- **Developed by**: Gareth Aurelius Harrison
|
| 9 |
- **Model type**: Logistic Regression (scikit-learn)
|
|
|
|
| 18 |
## Uses
|
| 19 |
|
| 20 |
### Direct Use
|
| 21 |
+
This model can be used to analyze Indonesian news articles and determine if they are written in a hoax-like style. It identifies linguistic patterns typical of fake news but does **not verify factual accuracy**. It is intended for educational, research, and journalistic purposes to help identify potentially sensational or misleading writing styles.
|
| 22 |
|
| 23 |
### Downstream Use
|
| 24 |
- News verification tools
|
|
|
|
| 42 |
- Requires human verification for critical applications
|
| 43 |
|
| 44 |
### Known Limitations
|
| 45 |
+
- **Stylistic vs Factual Analysis**: This model detects writing style typical of hoaxes, not factual inaccuracies. Legitimate news written sensationally may be flagged as hoax, and factual hoaxes written professionally may be missed.
|
| 46 |
- **Data Bias**: The model is trained on a limited dataset; performance may vary with different topics or writing styles
|
| 47 |
- **Language Specificity**: Only works for Indonesian text
|
| 48 |
- **Temporal Limitations**: News patterns change over time; the model may become less accurate with newer data
|