| --- |
| language: |
| - tr |
| license: apache-2.0 |
| tags: |
| - gpt2 |
| - turkish |
| - instruct |
| - thinking |
| model_name: SykoLLM-V2.4-Thinking-Beta |
| base_model: syko818121/SykoLLM-V2.3-Turkish-Instruct |
| model_type: causal-lm |
| parameters: ~96.1M params |
| datasets: |
| - Quardo/wikipedia-turkish-qa-chattemplate |
| --- |
| |
| # SykoLLM-V2.4-Thinking-Beta |
|
|
| This is the latest and most experimental version of the **SykoLLM** series. Developed and trained entirely by **Burak (15 years old)**, this model is designed to explore "Chain of Thought" (CoT) capabilities in small-scale Turkish Language Models. |
|
|
| ## Important Technical Distinction |
| **This model is **NOT a LoRA adapter** or a **simple copy of GPT-2.** It is a standalone, full-parameter fine-tuned model where the actual weights have been modified through training. The positional embeddings were manually expanded from 512 to 1024 tokens via a custom "weight surgery" process to support longer context natively. |
|
|
| ## ⚠️ Important: Beta Status |
| This model is currently in a **strict Beta phase**. |
| - The training for the "thinking" mechanism is still ongoing/experimental. |
| - **Note:** The model has been introduced to new special tokens, but it has not yet fully mastered the logic of "thinking" before answering. |
| - Users should expect inconsistent results regarding the use of `<thinking>` tags during this stage. |
|
|
| ## Model Specifications |
| - **Model Name:** SykoLLM-V2.4-Thinking-Beta |
| - **Parameter Count:** ~96.1M params (Lightweight and fast) |
| - **Vocabulary Size:** 50,000 (Custom tokenizer optimized for Turkish) |
| - **Context Window:** 1024 Tokens (Expanded from the original 512 via positional embedding surgery) |
| - **Architecture:** GPT-2 based with modified positional embeddings to support longer context. |
|
|
| ## Experimental "Thinking" Tokens |
| Special tokens have been added to the tokenizer to prepare the model for reasoning tasks: |
| - `<thinking>`: Intended for the model's internal reasoning process. |
| - `</thinking>`: End of the reasoning process. |
| - `<bos>` / `<eos>`: Beginning and end of string tokens. |
|
|
| ## 📊 Training Insights |
| The model was pre-trained on Turkish Wikipedia and fine-tuned on instruction datasets. |
| - **Learning Rate:** 2e-5 |
| - **Optimizer:** AdamW with a Cosine Scheduler. |
| - **Batch Size:** 32 (Effective batch size via Gradient Accumulation) |
| - **Loss Trend:** Started at ~8.0 and successfully converged to ~3.5 during current training runs. |
|
|
| ## About the Developer |
| SykoLLM-V2.4 is part of an ongoing project by Burak a AI enthusiast. The goal of this project is to demonstrate that small-scale models (under 100M parameters) can be fine-tuned to handle complex Turkish language structures and reasoning patterns. |
|
|
| ## License |
| Apache 2.0 |