# Model Card for IndoHoaxDetector ## Model Details ### Model Description IndoHoaxDetector is a binary classification model designed to detect hoax-style news articles in the Indonesian language. It uses logistic regression trained on linguistic features of Indonesian news to classify text as either legitimate or hoax-like writing. **This model analyzes writing style and patterns, not factual accuracy or truthfulness of the content.** - **Developed by**: Gareth Aurelius Harrison - **Model type**: Logistic Regression (scikit-learn) - **Language(s)**: Indonesian - **License**: MIT - **Finetuned from model**: N/A (trained from scratch) ### Model Sources - **Repository**: https://huggingface.co/theonegareth/IndoHoaxDetector - **Paper or resources**: N/A ## Uses ### Direct Use This model can be used to analyze Indonesian news articles and determine if they are written in a hoax-like style. It identifies linguistic patterns typical of fake news but does **not verify factual accuracy**. It is intended for educational, research, and journalistic purposes to help identify potentially sensational or misleading writing styles. ### Downstream Use - News verification tools - Fact-checking applications - Educational resources on misinformation - Research on Indonesian media landscape ### Out-of-Scope Use - Automated content moderation without human oversight - Legal or judicial decisions - Real-time censorship - Detection in other languages ## Bias, Risks, and Limitations ### Recommendations Users should be aware that this model: - Is trained on specific datasets and may not generalize to all Indonesian news - Can produce false positives/negatives - Should not be used as the sole basis for important decisions - Requires human verification for critical applications ### Known Limitations - **Stylistic vs Factual Analysis**: This model detects writing style typical of hoaxes, not factual inaccuracies. Legitimate news written sensationally may be flagged as hoax, and factual hoaxes written professionally may be missed. - **Data Bias**: The model is trained on a limited dataset; performance may vary with different topics or writing styles - **Language Specificity**: Only works for Indonesian text - **Temporal Limitations**: News patterns change over time; the model may become less accurate with newer data - **Binary Classification**: Does not provide nuanced assessments of credibility ### Ethical Considerations - **Misinformation Detection**: While helpful for identifying hoaxes, this technology could be misused to suppress legitimate dissenting views - **Privacy**: Text analysis may involve sensitive content - **Accessibility**: Should be used to empower users, not to restrict information access - **Transparency**: Model decisions should be explainable and verifiable ## Training Details ### Training Data - **Dataset**: Indonesian news articles dataset (details not publicly available) - **Preprocessing**: Text cleaning, tokenization, feature extraction (likely TF-IDF) - **Size**: Not specified - **Distribution**: Balanced between hoax and legitimate classes (assumed) ### Training Procedure - **Training Date**: October 29, 2024 - **Hardware**: Not specified - **Software**: scikit-learn - **Hyperparameters**: Default logistic regression parameters - **Carbon Footprint**: Not calculated ## Evaluation ### Testing Data - **Dataset**: Held-out test set from training data - **Size**: Not specified - **Distribution**: Balanced (assumed) ### Metrics - **Accuracy**: 97.83% - **Other metrics**: Not provided (precision, recall, F1-score unknown) ### Results The model achieves high accuracy on the test set, but detailed performance metrics per class are not available. ## Technical Specifications - **Input Format**: Raw Indonesian text - **Output Format**: Binary classification (0: legitimate, 1: hoax) with probability scores - **Model Size**: Small (pickle file ~ few MB) - **Inference Time**: Fast (< 1 second per prediction) ## Model Card Authors Gareth Aurelius Harrison ## Model Card Contact For questions or issues, please open an issue on the Hugging Face repository.