khaledsoudy's picture
Update README.md
69cf5e3 verified
---
license: mit
datasets:
- arbml/arabic_100k_reviews
language:
- ar
- en
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
tags:
- fine-tuning-bert-arbic
- fine-tuning-bert-sentiment-analysis
- sentiment-analysis
- text-classification
- ktrain-library
---
# Fine-Tuned Arabic Sentiment Analysis with BERT ๐Ÿš€
This repository contains a fine-tuned **BERT** model for sentiment analysis of Arabic reviews. The model is trained on the **[Arabic 100k Reviews](https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews)** dataset and can classify reviews into three sentiment categories: **Positive**, **Negative**, and **Mixed**.
## Author ๐Ÿง‘โ€๐Ÿ’ป
Khaled Soudy
GitHub: [khaledsoudy-1](https://github.com/khaledsoudy-1)
---
## Source Code ๐Ÿ’ป
You can find the source code and full implementation of this project on my [GitHub repository](https://github.com/khaledsoudy-1/FineTuning-BERT-Arabic-Sentiment/tree/main).
The repository contains the Google Colab notebook, dataset, and scripts used to fine-tune the model for Arabic sentiment analysis.
---
## How to Use the Model
### 1. Install Required Libraries
Make sure you have the **transformers** and **tensorflow** libraries installed:
```bash
!pip install transformers
```
```bash
!pip install tensorflow
```
### 2. Load the Fine-Tuned Model
You can load the fine-tuned model and tokenizer directly from Hugging Face using the following code:
```python
from transformers import TFBertForSequenceClassification, BertTokenizer
# Load model and tokenizer from Hugging Face
model_name = "khaledsoudy/arabic-sentiment-bert-model"
# Load model
model = TFBertForSequenceClassification.from_pretrained(model_name)
# Load tokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)
```
### 3. Use the Model for Prediction
To use the model for sentiment analysis on an Arabic text, follow these steps:
```python
import tensorflow as tf
# Sample Arabic text for sentiment prediction
text = "ุงู„ูู†ุฏู‚ ุฑุงุฆุน ูˆ ุงู„ุฎุฏู…ุฉ ู…ู…ุชุงุฒุฉ"
# Tokenize the input text
inputs = tokenizer(text, return_tensors="tf")
# Get the model's prediction
outputs = model(**inputs)
# Get the predicted sentiment (assuming 3 classes: Positive, Negative, Mixed)
predicted_class = tf.argmax(outputs.logits, axis=-1).numpy()
# Map the predicted class index to sentiment labels
sentiment_labels = ['Mixed', 'Negative', 'Positive']
print(f"Predicted sentiment: {sentiment_labels[predicted_class[0]]}")
```
### 4. Input Format
The model expects Arabic text input. The text should be preprocessed to remove unnecessary characters or diacritics for better results.
### 5. Sentiment Labels
The model classifies the sentiment into three categories:
- **Positive** ๐ŸŒŸ
- **Negative** ๐Ÿ˜ 
- **Mixed** ๐Ÿค”
## Model Details
- **Model Name:** `khaledsoudy/arabic-sentiment-bert-model`
- **Model Type:** `TFBertForSequenceClassification`
- **Language:** Arabic
- **Sentiment Classes:** Positive, Negative, Mixed
## How to Fine-Tune This Model
You can fine-tune this model further using your own dataset. Check out the source code and related notebooks on my GitHub for detailed steps and guidance.
## License ๐Ÿ“œ
This model is licensed under the MIT License.
## Acknowledgments ๐Ÿ™
- **Hugging Face** for providing the platform to host models.
- **Google BERT** for the pre-trained model.
- **Kaggle** for the **Arabic 100k Reviews** dataset.
---
This README is ready for use on your Hugging Face model page! It includes detailed usage instructions, links to your GitHub, and other relevant information.