Refactor README.md to remove licensing section and unsupported tasks, streamlining content for clarity
Browse files
README.md
CHANGED
|
@@ -1,11 +1,8 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-nc-4.0
|
| 3 |
language:
|
| 4 |
- tr
|
| 5 |
---
|
| 6 |
|
| 7 |
-
# TR Tokenizer: Turkish Word Segmentation Tool Based on Semantic Integrity
|
| 8 |
-
|
| 9 |
## Tokenizer Summary
|
| 10 |
TR Tokenizer is an innovative FastTokenizer that splits Turkish words according to their semantic integrity, using both current natural language processing methods and Turkish grammar rules. This fast and efficient tokenizer provides accurate and detailed results by analyzing words morphologically and semantically. For example, the sentence "akademisyenler ve aileleri ile birlikte aktif çalışıyorlar" (academics and their families are actively working together) is split into the following parts:
|
| 11 |
|
|
@@ -13,13 +10,6 @@ TR Tokenizer is an innovative FastTokenizer that splits Turkish words according
|
|
| 13 |
['akademisyen', 'ler', 've', 'aile', 'leri', 'ile', 'birlikte', 'aktif', 'çalış', 'ı', 'yor', 'lar']
|
| 14 |
```
|
| 15 |
|
| 16 |
-
## Supported Tasks and Applications
|
| 17 |
-
TR Tokenizer can be used for the following NLP tasks:
|
| 18 |
-
- **Morphological Analysis**: Analyzes the root and suffix structures of words.
|
| 19 |
-
- **Language Model Training and Fine-tuning**: Processes words according to their semantic integrity during the preprocessing phase of Turkish language model training.
|
| 20 |
-
- **Frequency Analysis**: Assists in determining word frequencies in texts.
|
| 21 |
-
- **Natural Language Processing (NLP) Research**: Used in research studying the morphological structure and word formations of the Turkish language.
|
| 22 |
-
|
| 23 |
## Languages
|
| 24 |
This tokenizer focuses on the **Turkish** language and is designed to support Turkish's rich morphological structure.
|
| 25 |
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- tr
|
| 4 |
---
|
| 5 |
|
|
|
|
|
|
|
| 6 |
## Tokenizer Summary
|
| 7 |
TR Tokenizer is an innovative FastTokenizer that splits Turkish words according to their semantic integrity, using both current natural language processing methods and Turkish grammar rules. This fast and efficient tokenizer provides accurate and detailed results by analyzing words morphologically and semantically. For example, the sentence "akademisyenler ve aileleri ile birlikte aktif çalışıyorlar" (academics and their families are actively working together) is split into the following parts:
|
| 8 |
|
|
|
|
| 10 |
['akademisyen', 'ler', 've', 'aile', 'leri', 'ile', 'birlikte', 'aktif', 'çalış', 'ı', 'yor', 'lar']
|
| 11 |
```
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
## Languages
|
| 14 |
This tokenizer focuses on the **Turkish** language and is designed to support Turkish's rich morphological structure.
|
| 15 |
|