File size: 2,140 Bytes
9b2bd5f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae8b76b
 
 
 
 
 
 
9b2bd5f
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language:
- tr
- otk
tags:
- gokturk
- text-generation
license: mit
---

# Bitig-Nano

This is a small AI model that can write text in the Göktürk (Old Turkic) script. It was trained on the Turkish Wikipedia dataset, which was converted into Göktürk letters.

> [!IMPORTANT]
> **Disclaimer:** This project is for **fun and hobby purposes only**. It is not a professional tool. The model might make mistakes or write things that are not historically accurate. It is a "Nano" sized model created for educational experiments.

## How to Use

You can use this model with the Python `transformers` library.

```python
from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

model_name = "eokayakca/Bitig-Nano"

tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

prompt = "𐱅𐰇𐰼" # Start with "Tür"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

output = model.generate(input_ids, max_length=50)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# The output is in Logical Order (Left-to-Right).
# For correct display, you might need to reverse it to Right-to-Left.
print(f"Logical (LTR): {generated_text}")
print(f"Visual (RTL):  {generated_text[::-1]}")
```

## About the Data

The model learned from Turkish Wikipedia articles. We changed the Latin letters to Göktürk letters using a custom converter script. 

**Technical Note:** The text is stored in **Logical Order (Left-to-Right)** for Unicode compatibility. However, Göktürk script is historically written and read from **Right-to-Left**. When you view the output, you may need to reverse it visually.

## Training Details

- **Hardware:** Apple M1 Mac (16 GB RAM)
- **Training Time:** ~20 hours
- **Epochs:** 3
- **Dataset:** [eokayakca/turkish-wikipedia-gokturk](https://huggingface.co/datasets/eokayakca/turkish-wikipedia-gokturk)

## Limitations

- The model is very small (Nano size).
- It may generate nonsense words or grammatically incorrect sentences.
- It is designed for testing and learning, not for serious translation or historical research.