Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,10 @@ This model converts diacritics in Palestinian colloquial Arabic to their estimat
|
|
| 13 |
The model is fine-tuned from Google's [CANINE-s](https://huggingface.co/google/canine-s) character level LM with a token classification head.
|
| 14 |
Each token (letter) of the input is classified into either of 7 classes: 'O' if not a diacritic, or one of 6 Hebrew vowels (see `model.config.id2label`).
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
# Example Usage
|
| 17 |
|
| 18 |
```python
|
|
@@ -83,7 +87,7 @@ to_taatik(heb_vowels)
|
|
| 83 |
```
|
| 84 |
|
| 85 |
```
|
| 86 |
-
Out[2]: "לַאזֵם נִעְטִי רַשַّאת וִקַאאִיֵّה לִלשַّג'ַר "
|
| 87 |
```
|
| 88 |
|
| 89 |
```python
|
|
@@ -130,4 +134,8 @@ to_translit(heb_vowels)
|
|
| 130 |
```
|
| 131 |
```
|
| 132 |
Out[3]: 'laazem niatiy raSHaat wiqaaaiYeh lilSHajar '
|
| 133 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
The model is fine-tuned from Google's [CANINE-s](https://huggingface.co/google/canine-s) character level LM with a token classification head.
|
| 14 |
Each token (letter) of the input is classified into either of 7 classes: 'O' if not a diacritic, or one of 6 Hebrew vowels (see `model.config.id2label`).
|
| 15 |
|
| 16 |
+
# Diacritizer
|
| 17 |
+
This model can be used in conjunction with [Levanti Diacritizer](https://huggingface.co/guymorlan/levanti_arabic2diacritics), which add diacritics to raw Palestinian Arabic text.
|
| 18 |
+
|
| 19 |
+
|
| 20 |
# Example Usage
|
| 21 |
|
| 22 |
```python
|
|
|
|
| 87 |
```
|
| 88 |
|
| 89 |
```
|
| 90 |
+
Out[2]: "לַאזֵם נִעְטִי רַשַّאת וִקַאאִיֵّה לִלשַّג'ַר "
|
| 91 |
```
|
| 92 |
|
| 93 |
```python
|
|
|
|
| 134 |
```
|
| 135 |
```
|
| 136 |
Out[3]: 'laazem niatiy raSHaat wiqaaaiYeh lilSHajar '
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
# Attribution
|
| 140 |
+
Created by Guy Mor-Lan.<br>
|
| 141 |
+
Contact: guy.mor AT mail.huji.ac.il
|