shaqaqio commited on
Commit
52fe3e5
·
verified ·
1 Parent(s): 91a459e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -1,3 +1,71 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Arabic End-of-Utterance (EOU) Classifier
2
+
3
+ ## Overview
4
+ This repository contains a custom PyTorch model for **End-of-Utterance (EOU) detection** in Arabic conversational text.
5
+ The model predicts whether a given text segment represents the end of a speaker’s turn.
6
+
7
+ This is a **custom architecture** (not a Hugging Face `AutoModel`) and is intended for research and development use.
8
+
9
+ ---
10
+
11
+ ## Task
12
+ Given an input text segment, the model outputs a binary prediction:
13
+
14
+ - `0` → The speaker is expected to continue speaking
15
+ - `1` → The speaker has finished their turn
16
+
17
+ ---
18
+
19
+ ## Model Details
20
+ - Framework: PyTorch
21
+ - Architecture: Custom `EOUClassifier`
22
+ - Task: Binary classification (EOU detection)
23
+ - Language: Arabic
24
+
25
+ ---
26
+
27
+ ## Tokenizer
28
+ This model uses the tokenizer from:
29
+
30
+ `Omartificial-Intelligence-Space/SA-BERT-V1`
31
+
32
+ The tokenizer is **not included** in this repository and must be loaded separately.
33
+
34
+ ---
35
+
36
+ ## Files
37
+ - `model.py` — Model architecture (`EOUClassifier`)
38
+ - `model.pt` — Trained model weights
39
+ - `config.json` — Model configuration
40
+ - `README.md` — This file
41
+
42
+ ---
43
+
44
+ ## Loading the Model
45
+ ```python
46
+ import torch
47
+ from transformers import AutoTokenizer
48
+ from model import EOUClassifier
49
+
50
+ tokenizer = AutoTokenizer.from_pretrained(
51
+ "Omartificial-Intelligence-Space/SA-BERT-V1"
52
+ )
53
+
54
+ model = EOUClassifier()
55
+ model.load_state_dict(
56
+ torch.load("model.pt", map_location="cpu")
57
+ )
58
+ model.eval()
59
+
60
+ examples = ["مقصدي من الموضوع انه", "اتمنى تقدر تساعدني"]
61
+
62
+
63
+ batch = tokenizer(examples, padding=True, truncation=True, return_tensors="pt")
64
+ batch.to(device)
65
+
66
+ out = model(batch["input_ids"], batch["attention_mask"])
67
+ ```
68
+
69
+ ## license
70
+
71
+ MIT