burak
Update README.md
856676b verified
---
language:
- tr
license: apache-2.0
tags:
- gpt2
- turkish
- instruct
- thinking
model_name: SykoLLM-V2.4-Thinking-Beta
base_model: syko818121/SykoLLM-V2.3-Turkish-Instruct
model_type: causal-lm
parameters: ~96.1M params
datasets:
- Quardo/wikipedia-turkish-qa-chattemplate
---
# SykoLLM-V2.4-Thinking-Beta
This is the latest and most experimental version of the **SykoLLM** series. Developed and trained entirely by **Burak (15 years old)**, this model is designed to explore "Chain of Thought" (CoT) capabilities in small-scale Turkish Language Models.
## Important Technical Distinction
**This model is **NOT a LoRA adapter** or a **simple copy of GPT-2.** It is a standalone, full-parameter fine-tuned model where the actual weights have been modified through training. The positional embeddings were manually expanded from 512 to 1024 tokens via a custom "weight surgery" process to support longer context natively.
## ⚠️ Important: Beta Status
This model is currently in a **strict Beta phase**.
- The training for the "thinking" mechanism is still ongoing/experimental.
- **Note:** The model has been introduced to new special tokens, but it has not yet fully mastered the logic of "thinking" before answering.
- Users should expect inconsistent results regarding the use of `<thinking>` tags during this stage.
## Model Specifications
- **Model Name:** SykoLLM-V2.4-Thinking-Beta
- **Parameter Count:** ~96.1M params (Lightweight and fast)
- **Vocabulary Size:** 50,000 (Custom tokenizer optimized for Turkish)
- **Context Window:** 1024 Tokens (Expanded from the original 512 via positional embedding surgery)
- **Architecture:** GPT-2 based with modified positional embeddings to support longer context.
## Experimental "Thinking" Tokens
Special tokens have been added to the tokenizer to prepare the model for reasoning tasks:
- `<thinking>`: Intended for the model's internal reasoning process.
- `</thinking>`: End of the reasoning process.
- `<bos>` / `<eos>`: Beginning and end of string tokens.
## 📊 Training Insights
The model was pre-trained on Turkish Wikipedia and fine-tuned on instruction datasets.
- **Learning Rate:** 2e-5
- **Optimizer:** AdamW with a Cosine Scheduler.
- **Batch Size:** 32 (Effective batch size via Gradient Accumulation)
- **Loss Trend:** Started at ~8.0 and successfully converged to ~3.5 during current training runs.
## About the Developer
SykoLLM-V2.4 is part of an ongoing project by Burak a AI enthusiast. The goal of this project is to demonstrate that small-scale models (under 100M parameters) can be fine-tuned to handle complex Turkish language structures and reasoning patterns.
## License
Apache 2.0