İsmail Kağan Acar commited on
Revise README with new title and tokenizer info
Browse filesUpdated project title and added details about tokenizer.
README.md
CHANGED
|
@@ -1,6 +1,7 @@
|
|
| 1 |
-
# ismail -
|
|
|
|
| 2 |
|
| 3 |
-
**ismail** is a from-scratch Turkish language model implementation designed for low-end hardware, built and trained on a single RTX 5070 (12GB). This is my first LLM project, heavily inspired by [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) and built with guidance from [LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch).
|
| 4 |
|
| 5 |
**Language Focus**: ismail is trained exclusively on Turkish datasets using a custom morphology-aware tokenizer optimized for Turkish's agglutinative structure.
|
| 6 |
|
|
|
|
| 1 |
+
# ismail - DeepSeek-V3 Inspired Turkish LLM Implementation
|
| 2 |
+
<br>
|
| 3 |
|
| 4 |
+
**ismail** is a from-scratch Turkish language model implementation designed for low-end hardware, built and trained on a single RTX 5070 (12GB). This is my first LLM project, heavily inspired by [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) and built with guidance from [LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch). Ismail utilizes Ali Bayram's [Turkish Tiktokenizer](https://huggingface.co/spaces/alibayram/turkish_tiktokenizer), a morphology-based tokenizer that achieves significantly better compression for agglutinative languages than standard BPE.
|
| 5 |
|
| 6 |
**Language Focus**: ismail is trained exclusively on Turkish datasets using a custom morphology-aware tokenizer optimized for Turkish's agglutinative structure.
|
| 7 |
|