| # Arabic End-of-Utterance (EOU) Classifier |
|
|
| ## Overview |
| This repository contains a custom PyTorch model for **End-of-Utterance (EOU) detection** in Arabic conversational text. |
| The model predicts whether a given text segment represents the end of a speaker’s turn. |
|
|
| This is a **custom architecture** (not a Hugging Face `AutoModel`) and is intended for research and development use. |
|
|
| --- |
|
|
| ## Task |
| Given an input text segment, the model outputs a binary prediction: |
|
|
| - `0` → The speaker is expected to continue speaking |
| - `1` → The speaker has finished their turn |
|
|
| --- |
|
|
| ## Model Details |
| - Framework: PyTorch |
| - Architecture: Custom `EOUClassifier` |
| - Task: Binary classification (EOU detection) |
| - Language: Arabic |
|
|
| --- |
|
|
| ## Tokenizer |
| This model uses the tokenizer from: |
|
|
| `Omartificial-Intelligence-Space/SA-BERT-V1` |
|
|
| The tokenizer is **not included** in this repository and must be loaded separately. |
|
|
| --- |
|
|
| ## Files |
| - `model.py` — Model architecture (`EOUClassifier`) |
| - `model.pt` — Trained model weights |
| - `config.json` — Model configuration |
| - `README.md` — This file |
|
|
| --- |
|
|
| ## Loading the Model |
| ```python |
| import torch |
| from transformers import AutoTokenizer |
| from model import EOUClassifier |
| |
| tokenizer = AutoTokenizer.from_pretrained( |
| "Omartificial-Intelligence-Space/SA-BERT-V1" |
| ) |
| |
| model = EOUClassifier() |
| model.load_state_dict( |
| torch.load("model.pt", map_location="cpu") |
| ) |
| model.eval() |
| |
| examples = ["مقصدي من الموضوع انه", "اتمنى تقدر تساعدني"] |
| |
| |
| batch = tokenizer(examples, padding=True, truncation=True, return_tensors="pt") |
| batch.to(device) |
| |
| out = model(batch["input_ids"], batch["attention_mask"]) |
| ``` |
|
|
| ## license |
|
|
| MIT |