Update README.md
Browse files
README.md
CHANGED
|
@@ -29,8 +29,6 @@ This is the **SARF** (Sarf-Aware Representation Framework) tokenizer designed fo
|
|
| 29 |
- Prefixes, suffixes, infixes
|
| 30 |
- Tense, gender, number, and derivation
|
| 31 |
|
| 32 |
-
> **Ṣarf is the exact linguistic layer that makes Arabic hard for naive tokenizers.**
|
| 33 |
-
|
| 34 |
SARF combines morphological analysis with BPE tokenization to achieve better compression, especially for morphologically rich languages like Arabic.
|
| 35 |
|
| 36 |
Most tokenizers treat Arabic as **bytes or characters**. **SARF treats Arabic as a *language*.**
|
|
@@ -47,7 +45,7 @@ Most tokenizers treat Arabic as **bytes or characters**. **SARF treats Arabic as
|
|
| 47 |
## Installation
|
| 48 |
|
| 49 |
```bash
|
| 50 |
-
pip install deeplatent-nlp
|
| 51 |
```
|
| 52 |
|
| 53 |
## Quick Start
|
|
|
|
| 29 |
- Prefixes, suffixes, infixes
|
| 30 |
- Tense, gender, number, and derivation
|
| 31 |
|
|
|
|
|
|
|
| 32 |
SARF combines morphological analysis with BPE tokenization to achieve better compression, especially for morphologically rich languages like Arabic.
|
| 33 |
|
| 34 |
Most tokenizers treat Arabic as **bytes or characters**. **SARF treats Arabic as a *language*.**
|
|
|
|
| 45 |
## Installation
|
| 46 |
|
| 47 |
```bash
|
| 48 |
+
uv pip install deeplatent-nlp
|
| 49 |
```
|
| 50 |
|
| 51 |
## Quick Start
|