--- language: - tr - otk tags: - gokturk - text-generation license: mit --- # Bitig-Nano This is a small AI model that can write text in the Göktürk (Old Turkic) script. It was trained on the Turkish Wikipedia dataset, which was converted into Göktürk letters. > [!IMPORTANT] > **Disclaimer:** This project is for **fun and hobby purposes only**. It is not a professional tool. The model might make mistakes or write things that are not historically accurate. It is a "Nano" sized model created for educational experiments. ## How to Use You can use this model with the Python `transformers` library. ```python from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast model_name = "eokayakca/Bitig-Nano" tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name) model = GPT2LMHeadModel.from_pretrained(model_name) prompt = "𐱅𐰇𐰼" # Start with "Tür" input_ids = tokenizer.encode(prompt, return_tensors="pt") output = model.generate(input_ids, max_length=50) generated_text = tokenizer.decode(output[0], skip_special_tokens=True) # The output is in Logical Order (Left-to-Right). # For correct display, you might need to reverse it to Right-to-Left. print(f"Logical (LTR): {generated_text}") print(f"Visual (RTL): {generated_text[::-1]}") ``` ## About the Data The model learned from Turkish Wikipedia articles. We changed the Latin letters to Göktürk letters using a custom converter script. **Technical Note:** The text is stored in **Logical Order (Left-to-Right)** for Unicode compatibility. However, Göktürk script is historically written and read from **Right-to-Left**. When you view the output, you may need to reverse it visually. ## Training Details - **Hardware:** Apple M1 Mac (16 GB RAM) - **Training Time:** ~20 hours - **Epochs:** 3 - **Dataset:** [eokayakca/turkish-wikipedia-gokturk](https://huggingface.co/datasets/eokayakca/turkish-wikipedia-gokturk) ## Limitations - The model is very small (Nano size). - It may generate nonsense words or grammatically incorrect sentences. - It is designed for testing and learning, not for serious translation or historical research.