Update README.md

8f31997 verified over 1 year ago

3.85 kB

	---
	license: mit
	datasets:
	- HausaNLP/NaijaSenti-Twitter
	language:
	- ha
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	base_model: google-bert/bert-base-cased
	pipeline_tag: text-classification
	library_name: transformers
	tags:
	- NLP
	- sentiment-analysis
	- hausa
	---

	Model Name: Hausa Sentiment Analysis
	Model ID: `Kumshe/Hausa-sentiment-analysis`
	Language: Hausa

	---

	### Model Description
	This model is a BERT-based model fine-tuned for sentiment analysis in the Hausa language. It is trained to classify social media text into different sentiment categories: positive, negative, or neutral.

	### Intended Use
	- Primary Use Case: Sentiment analysis for Hausa social media content, such as tweets or Facebook posts.
	- Target Users: NLP researchers, businesses analyzing social media, and developers building sentiment analysis tools for Hausa language content.
	- Example Usage:
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load the model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Kumshe/Hausa-sentiment-analysis")
	model = AutoModelForSequenceClassification.from_pretrained("Kumshe/Hausa-sentiment-analysis")

	# Encode the input text
	inputs = tokenizer("Your Hausa text here", return_tensors="pt")

	# Get model predictions
	outputs = model(**inputs)
	```

	### Model Architecture
	- Base Model: BERT (Bidirectional Encoder Representations from Transformers)
	- Pre-trained Model: `bert-base-cased` from Hugging Face Transformers library.
	- Fine-Tuned Model: Fine-tuned for 40 epochs on a Hausa sentiment dataset.

	### Training Data
	- Data Source: The model was trained on a dataset containing 35,000 examples from social media platforms such as Twitter and Facebook.
	- Data Split:
	- Training Set: 80% of the data
	- Validation Set: 20% of the data

	### Training Details
	- Number of Epochs: 40
	- Batch Size:
	- Per device training batch size: 32
	- Per device evaluation batch size: 64
	- Learning Rate Schedule: Warm-up steps: 10, Weight decay: 0.01
	- Optimizer: AdamW
	- Training Hardware: Trained on Kaggle using 2 NVIDIA T4 GPUs.

	### Evaluation Metrics
	- Evaluation Loss: 0.6265
	- Accuracy: 73.47%
	- F1 Score: 73.47%
	- Precision: 73.54%
	- Recall: 73.47%

	### Model Performance
	The model performs well on the given dataset, achieving a balanced performance between precision, recall, and F1 score, making it suitable for general sentiment analysis tasks in Hausa language text.

	### Limitations
	- The model may not generalize well to other types of Hausa text outside of social media (e.g., formal writing or literature).
	- Performance may degrade on text containing slang or regional dialects not well-represented in the training data.
	- The model is biased towards the examples in the training dataset; biases in the data may affect predictions.

	### Ethical Considerations
	- Sentiment analysis models can potentially amplify biases present in the training data.
	- Use cautiously in sensitive applications to avoid unintended consequences.
	- Consider the impact on privacy and data protection laws, especially when analyzing social media content.

	### License
	-

	### Citation
	If you use this model in your work, please cite it as follows:
	```
	@misc{Kumshe2024HausaSentimentAnalysis,
	author = {Umar Muhammad Mustapha Kumshe},
	title = {Hausa Sentiment Analysis},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Kumshe/Hausa-sentiment-analysis}},
	}
	```

	### Contributions
	This model was fine-tuned by Umar Muhammad Mustapha Kumshe. Feel free to contribute, provide feedback, or raise issues on the [model repository](https://huggingface.co/Kumshe/Hausa-sentiment-analysis).