IndoHoaxDetector / modelcard.md

Clarify that model detects hoax-style writing, not factual accuracy

45fd788 2 months ago

4.13 kB

	# Model Card for IndoHoaxDetector

	## Model Details

	### Model Description
	IndoHoaxDetector is a binary classification model designed to detect hoax-style news articles in the Indonesian language. It uses logistic regression trained on linguistic features of Indonesian news to classify text as either legitimate or hoax-like writing. This model analyzes writing style and patterns, not factual accuracy or truthfulness of the content.

	- Developed by: Gareth Aurelius Harrison
	- Model type: Logistic Regression (scikit-learn)
	- Language(s): Indonesian
	- License: MIT
	- Finetuned from model: N/A (trained from scratch)

	### Model Sources
	- Repository: https://huggingface.co/theonegareth/IndoHoaxDetector
	- Paper or resources: N/A

	## Uses

	### Direct Use
	This model can be used to analyze Indonesian news articles and determine if they are written in a hoax-like style. It identifies linguistic patterns typical of fake news but does not verify factual accuracy. It is intended for educational, research, and journalistic purposes to help identify potentially sensational or misleading writing styles.

	### Downstream Use
	- News verification tools
	- Fact-checking applications
	- Educational resources on misinformation
	- Research on Indonesian media landscape

	### Out-of-Scope Use
	- Automated content moderation without human oversight
	- Legal or judicial decisions
	- Real-time censorship
	- Detection in other languages

	## Bias, Risks, and Limitations

	### Recommendations
	Users should be aware that this model:
	- Is trained on specific datasets and may not generalize to all Indonesian news
	- Can produce false positives/negatives
	- Should not be used as the sole basis for important decisions
	- Requires human verification for critical applications

	### Known Limitations
	- Stylistic vs Factual Analysis: This model detects writing style typical of hoaxes, not factual inaccuracies. Legitimate news written sensationally may be flagged as hoax, and factual hoaxes written professionally may be missed.
	- Data Bias: The model is trained on a limited dataset; performance may vary with different topics or writing styles
	- Language Specificity: Only works for Indonesian text
	- Temporal Limitations: News patterns change over time; the model may become less accurate with newer data
	- Binary Classification: Does not provide nuanced assessments of credibility

	### Ethical Considerations
	- Misinformation Detection: While helpful for identifying hoaxes, this technology could be misused to suppress legitimate dissenting views
	- Privacy: Text analysis may involve sensitive content
	- Accessibility: Should be used to empower users, not to restrict information access
	- Transparency: Model decisions should be explainable and verifiable

	## Training Details

	### Training Data
	- Dataset: Indonesian news articles dataset (details not publicly available)
	- Preprocessing: Text cleaning, tokenization, feature extraction (likely TF-IDF)
	- Size: Not specified
	- Distribution: Balanced between hoax and legitimate classes (assumed)

	### Training Procedure
	- Training Date: October 29, 2024
	- Hardware: Not specified
	- Software: scikit-learn
	- Hyperparameters: Default logistic regression parameters
	- Carbon Footprint: Not calculated

	## Evaluation

	### Testing Data
	- Dataset: Held-out test set from training data
	- Size: Not specified
	- Distribution: Balanced (assumed)

	### Metrics
	- Accuracy: 97.83%
	- Other metrics: Not provided (precision, recall, F1-score unknown)

	### Results
	The model achieves high accuracy on the test set, but detailed performance metrics per class are not available.

	## Technical Specifications
	- Input Format: Raw Indonesian text
	- Output Format: Binary classification (0: legitimate, 1: hoax) with probability scores
	- Model Size: Small (pickle file ~ few MB)
	- Inference Time: Fast (< 1 second per prediction)

	## Model Card Authors
	Gareth Aurelius Harrison

	## Model Card Contact
	For questions or issues, please open an issue on the Hugging Face repository.