Update README.md

69cf5e3 verified about 1 year ago

3.61 kB

	---
	license: mit
	datasets:
	- arbml/arabic_100k_reviews
	language:
	- ar
	- en
	base_model:
	- google-bert/bert-base-uncased
	pipeline_tag: text-classification
	tags:
	- fine-tuning-bert-arbic
	- fine-tuning-bert-sentiment-analysis
	- sentiment-analysis
	- text-classification
	- ktrain-library
	---


	# Fine-Tuned Arabic Sentiment Analysis with BERT 🚀

	This repository contains a fine-tuned BERT model for sentiment analysis of Arabic reviews. The model is trained on the [Arabic 100k Reviews](https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews) dataset and can classify reviews into three sentiment categories: Positive, Negative, and Mixed.

	## Author 🧑‍💻

	Khaled Soudy
	GitHub: [khaledsoudy-1](https://github.com/khaledsoudy-1)

	---

	## Source Code 💻

	You can find the source code and full implementation of this project on my [GitHub repository](https://github.com/khaledsoudy-1/FineTuning-BERT-Arabic-Sentiment/tree/main).

	The repository contains the Google Colab notebook, dataset, and scripts used to fine-tune the model for Arabic sentiment analysis.

	---

	## How to Use the Model

	### 1. Install Required Libraries

	Make sure you have the transformers and tensorflow libraries installed:

	```bash
	!pip install transformers
	```

	```bash
	!pip install tensorflow
	```


	### 2. Load the Fine-Tuned Model

	You can load the fine-tuned model and tokenizer directly from Hugging Face using the following code:

	```python
	from transformers import TFBertForSequenceClassification, BertTokenizer

	# Load model and tokenizer from Hugging Face
	model_name = "khaledsoudy/arabic-sentiment-bert-model"

	# Load model
	model = TFBertForSequenceClassification.from_pretrained(model_name)

	# Load tokenizer
	tokenizer = BertTokenizer.from_pretrained(model_name)
	```

	### 3. Use the Model for Prediction

	To use the model for sentiment analysis on an Arabic text, follow these steps:

	```python
	import tensorflow as tf


	# Sample Arabic text for sentiment prediction
	text = "الفندق رائع و الخدمة ممتازة"

	# Tokenize the input text
	inputs = tokenizer(text, return_tensors="tf")

	# Get the model's prediction
	outputs = model(**inputs)

	# Get the predicted sentiment (assuming 3 classes: Positive, Negative, Mixed)
	predicted_class = tf.argmax(outputs.logits, axis=-1).numpy()

	# Map the predicted class index to sentiment labels
	sentiment_labels = ['Mixed', 'Negative', 'Positive']
	print(f"Predicted sentiment: {sentiment_labels[predicted_class[0]]}")
	```

	### 4. Input Format

	The model expects Arabic text input. The text should be preprocessed to remove unnecessary characters or diacritics for better results.

	### 5. Sentiment Labels

	The model classifies the sentiment into three categories:

	- Positive 🌟
	- Negative 😠
	- Mixed 🤔

	## Model Details

	- Model Name: `khaledsoudy/arabic-sentiment-bert-model`
	- Model Type: `TFBertForSequenceClassification`
	- Language: Arabic
	- Sentiment Classes: Positive, Negative, Mixed

	## How to Fine-Tune This Model

	You can fine-tune this model further using your own dataset. Check out the source code and related notebooks on my GitHub for detailed steps and guidance.

	## License 📜

	This model is licensed under the MIT License.

	## Acknowledgments 🙏

	- Hugging Face for providing the platform to host models.
	- Google BERT for the pre-trained model.
	- Kaggle for the Arabic 100k Reviews dataset.

	---

	This README is ready for use on your Hugging Face model page! It includes detailed usage instructions, links to your GitHub, and other relevant information.