--- language: - tr license: apache-2.0 tags: - gpt2 - turkish - instruct - thinking model_name: SykoLLM-V2.4-Thinking-Beta base_model: syko818121/SykoLLM-V2.3-Turkish-Instruct model_type: causal-lm parameters: ~96.1M params datasets: - Quardo/wikipedia-turkish-qa-chattemplate --- # SykoLLM-V2.4-Thinking-Beta This is the latest and most experimental version of the **SykoLLM** series. Developed and trained entirely by **Burak (15 years old)**, this model is designed to explore "Chain of Thought" (CoT) capabilities in small-scale Turkish Language Models. ## Important Technical Distinction **This model is **NOT a LoRA adapter** or a **simple copy of GPT-2.** It is a standalone, full-parameter fine-tuned model where the actual weights have been modified through training. The positional embeddings were manually expanded from 512 to 1024 tokens via a custom "weight surgery" process to support longer context natively. ## ⚠️ Important: Beta Status This model is currently in a **strict Beta phase**. - The training for the "thinking" mechanism is still ongoing/experimental. - **Note:** The model has been introduced to new special tokens, but it has not yet fully mastered the logic of "thinking" before answering. - Users should expect inconsistent results regarding the use of `` tags during this stage. ## Model Specifications - **Model Name:** SykoLLM-V2.4-Thinking-Beta - **Parameter Count:** ~96.1M params (Lightweight and fast) - **Vocabulary Size:** 50,000 (Custom tokenizer optimized for Turkish) - **Context Window:** 1024 Tokens (Expanded from the original 512 via positional embedding surgery) - **Architecture:** GPT-2 based with modified positional embeddings to support longer context. ## Experimental "Thinking" Tokens Special tokens have been added to the tokenizer to prepare the model for reasoning tasks: - ``: Intended for the model's internal reasoning process. - ``: End of the reasoning process. - `` / ``: Beginning and end of string tokens. ## 📊 Training Insights The model was pre-trained on Turkish Wikipedia and fine-tuned on instruction datasets. - **Learning Rate:** 2e-5 - **Optimizer:** AdamW with a Cosine Scheduler. - **Batch Size:** 32 (Effective batch size via Gradient Accumulation) - **Loss Trend:** Started at ~8.0 and successfully converged to ~3.5 during current training runs. ## About the Developer SykoLLM-V2.4 is part of an ongoing project by Burak a AI enthusiast. The goal of this project is to demonstrate that small-scale models (under 100M parameters) can be fine-tuned to handle complex Turkish language structures and reasoning patterns. ## License Apache 2.0