|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- arbml/arabic_100k_reviews |
|
|
language: |
|
|
- ar |
|
|
- en |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- fine-tuning-bert-arbic |
|
|
- fine-tuning-bert-sentiment-analysis |
|
|
- sentiment-analysis |
|
|
- text-classification |
|
|
- ktrain-library |
|
|
--- |
|
|
|
|
|
|
|
|
# Fine-Tuned Arabic Sentiment Analysis with BERT ๐ |
|
|
|
|
|
This repository contains a fine-tuned **BERT** model for sentiment analysis of Arabic reviews. The model is trained on the **[Arabic 100k Reviews](https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews)** dataset and can classify reviews into three sentiment categories: **Positive**, **Negative**, and **Mixed**. |
|
|
|
|
|
## Author ๐งโ๐ป |
|
|
|
|
|
Khaled Soudy |
|
|
GitHub: [khaledsoudy-1](https://github.com/khaledsoudy-1) |
|
|
|
|
|
--- |
|
|
|
|
|
## Source Code ๐ป |
|
|
|
|
|
You can find the source code and full implementation of this project on my [GitHub repository](https://github.com/khaledsoudy-1/FineTuning-BERT-Arabic-Sentiment/tree/main). |
|
|
|
|
|
The repository contains the Google Colab notebook, dataset, and scripts used to fine-tune the model for Arabic sentiment analysis. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use the Model |
|
|
|
|
|
### 1. Install Required Libraries |
|
|
|
|
|
Make sure you have the **transformers** and **tensorflow** libraries installed: |
|
|
|
|
|
```bash |
|
|
!pip install transformers |
|
|
``` |
|
|
|
|
|
```bash |
|
|
!pip install tensorflow |
|
|
``` |
|
|
|
|
|
|
|
|
### 2. Load the Fine-Tuned Model |
|
|
|
|
|
You can load the fine-tuned model and tokenizer directly from Hugging Face using the following code: |
|
|
|
|
|
```python |
|
|
from transformers import TFBertForSequenceClassification, BertTokenizer |
|
|
|
|
|
# Load model and tokenizer from Hugging Face |
|
|
model_name = "khaledsoudy/arabic-sentiment-bert-model" |
|
|
|
|
|
# Load model |
|
|
model = TFBertForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = BertTokenizer.from_pretrained(model_name) |
|
|
``` |
|
|
|
|
|
### 3. Use the Model for Prediction |
|
|
|
|
|
To use the model for sentiment analysis on an Arabic text, follow these steps: |
|
|
|
|
|
```python |
|
|
import tensorflow as tf |
|
|
|
|
|
|
|
|
# Sample Arabic text for sentiment prediction |
|
|
text = "ุงูููุฏู ุฑุงุฆุน ู ุงูุฎุฏู
ุฉ ู
ู
ุชุงุฒุฉ" |
|
|
|
|
|
# Tokenize the input text |
|
|
inputs = tokenizer(text, return_tensors="tf") |
|
|
|
|
|
# Get the model's prediction |
|
|
outputs = model(**inputs) |
|
|
|
|
|
# Get the predicted sentiment (assuming 3 classes: Positive, Negative, Mixed) |
|
|
predicted_class = tf.argmax(outputs.logits, axis=-1).numpy() |
|
|
|
|
|
# Map the predicted class index to sentiment labels |
|
|
sentiment_labels = ['Mixed', 'Negative', 'Positive'] |
|
|
print(f"Predicted sentiment: {sentiment_labels[predicted_class[0]]}") |
|
|
``` |
|
|
|
|
|
### 4. Input Format |
|
|
|
|
|
The model expects Arabic text input. The text should be preprocessed to remove unnecessary characters or diacritics for better results. |
|
|
|
|
|
### 5. Sentiment Labels |
|
|
|
|
|
The model classifies the sentiment into three categories: |
|
|
|
|
|
- **Positive** ๐ |
|
|
- **Negative** ๐ |
|
|
- **Mixed** ๐ค |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Name:** `khaledsoudy/arabic-sentiment-bert-model` |
|
|
- **Model Type:** `TFBertForSequenceClassification` |
|
|
- **Language:** Arabic |
|
|
- **Sentiment Classes:** Positive, Negative, Mixed |
|
|
|
|
|
## How to Fine-Tune This Model |
|
|
|
|
|
You can fine-tune this model further using your own dataset. Check out the source code and related notebooks on my GitHub for detailed steps and guidance. |
|
|
|
|
|
## License ๐ |
|
|
|
|
|
This model is licensed under the MIT License. |
|
|
|
|
|
## Acknowledgments ๐ |
|
|
|
|
|
- **Hugging Face** for providing the platform to host models. |
|
|
- **Google BERT** for the pre-trained model. |
|
|
- **Kaggle** for the **Arabic 100k Reviews** dataset. |
|
|
|
|
|
--- |
|
|
|
|
|
This README is ready for use on your Hugging Face model page! It includes detailed usage instructions, links to your GitHub, and other relevant information. |