İsmail Kağan Acar commited on
Commit
a0ee795
·
unverified ·
1 Parent(s): af1f8bc

Revise README with new title and tokenizer info

Browse files

Updated project title and added details about tokenizer.

Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -1,6 +1,7 @@
1
- # ismail - "Is My AI Lame?"
 
2
 
3
- **ismail** is a from-scratch Turkish language model implementation designed for low-end hardware, built and trained on a single RTX 5070 (12GB). This is my first LLM project, heavily inspired by [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) and built with guidance from [LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch).
4
 
5
  **Language Focus**: ismail is trained exclusively on Turkish datasets using a custom morphology-aware tokenizer optimized for Turkish's agglutinative structure.
6
 
 
1
+ # ismail - DeepSeek-V3 Inspired Turkish LLM Implementation
2
+ ![Status](https://img.shields.io/badge/Status-Untrained_Architecture-yellow)<br>
3
 
4
+ **ismail** is a from-scratch Turkish language model implementation designed for low-end hardware, built and trained on a single RTX 5070 (12GB). This is my first LLM project, heavily inspired by [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) and built with guidance from [LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch). Ismail utilizes Ali Bayram's [Turkish Tiktokenizer](https://huggingface.co/spaces/alibayram/turkish_tiktokenizer), a morphology-based tokenizer that achieves significantly better compression for agglutinative languages than standard BPE.
5
 
6
  **Language Focus**: ismail is trained exclusively on Turkish datasets using a custom morphology-aware tokenizer optimized for Turkish's agglutinative structure.
7