IndoHoaxDetector / modelcard.md
theonegareth's picture
Clarify that model detects hoax-style writing, not factual accuracy
45fd788
# Model Card for IndoHoaxDetector
## Model Details
### Model Description
IndoHoaxDetector is a binary classification model designed to detect hoax-style news articles in the Indonesian language. It uses logistic regression trained on linguistic features of Indonesian news to classify text as either legitimate or hoax-like writing. **This model analyzes writing style and patterns, not factual accuracy or truthfulness of the content.**
- **Developed by**: Gareth Aurelius Harrison
- **Model type**: Logistic Regression (scikit-learn)
- **Language(s)**: Indonesian
- **License**: MIT
- **Finetuned from model**: N/A (trained from scratch)
### Model Sources
- **Repository**: https://huggingface.co/theonegareth/IndoHoaxDetector
- **Paper or resources**: N/A
## Uses
### Direct Use
This model can be used to analyze Indonesian news articles and determine if they are written in a hoax-like style. It identifies linguistic patterns typical of fake news but does **not verify factual accuracy**. It is intended for educational, research, and journalistic purposes to help identify potentially sensational or misleading writing styles.
### Downstream Use
- News verification tools
- Fact-checking applications
- Educational resources on misinformation
- Research on Indonesian media landscape
### Out-of-Scope Use
- Automated content moderation without human oversight
- Legal or judicial decisions
- Real-time censorship
- Detection in other languages
## Bias, Risks, and Limitations
### Recommendations
Users should be aware that this model:
- Is trained on specific datasets and may not generalize to all Indonesian news
- Can produce false positives/negatives
- Should not be used as the sole basis for important decisions
- Requires human verification for critical applications
### Known Limitations
- **Stylistic vs Factual Analysis**: This model detects writing style typical of hoaxes, not factual inaccuracies. Legitimate news written sensationally may be flagged as hoax, and factual hoaxes written professionally may be missed.
- **Data Bias**: The model is trained on a limited dataset; performance may vary with different topics or writing styles
- **Language Specificity**: Only works for Indonesian text
- **Temporal Limitations**: News patterns change over time; the model may become less accurate with newer data
- **Binary Classification**: Does not provide nuanced assessments of credibility
### Ethical Considerations
- **Misinformation Detection**: While helpful for identifying hoaxes, this technology could be misused to suppress legitimate dissenting views
- **Privacy**: Text analysis may involve sensitive content
- **Accessibility**: Should be used to empower users, not to restrict information access
- **Transparency**: Model decisions should be explainable and verifiable
## Training Details
### Training Data
- **Dataset**: Indonesian news articles dataset (details not publicly available)
- **Preprocessing**: Text cleaning, tokenization, feature extraction (likely TF-IDF)
- **Size**: Not specified
- **Distribution**: Balanced between hoax and legitimate classes (assumed)
### Training Procedure
- **Training Date**: October 29, 2024
- **Hardware**: Not specified
- **Software**: scikit-learn
- **Hyperparameters**: Default logistic regression parameters
- **Carbon Footprint**: Not calculated
## Evaluation
### Testing Data
- **Dataset**: Held-out test set from training data
- **Size**: Not specified
- **Distribution**: Balanced (assumed)
### Metrics
- **Accuracy**: 97.83%
- **Other metrics**: Not provided (precision, recall, F1-score unknown)
### Results
The model achieves high accuracy on the test set, but detailed performance metrics per class are not available.
## Technical Specifications
- **Input Format**: Raw Indonesian text
- **Output Format**: Binary classification (0: legitimate, 1: hoax) with probability scores
- **Model Size**: Small (pickle file ~ few MB)
- **Inference Time**: Fast (< 1 second per prediction)
## Model Card Authors
Gareth Aurelius Harrison
## Model Card Contact
For questions or issues, please open an issue on the Hugging Face repository.