SykoLLM-V2.4-Thinking-Beta
This is the latest and most experimental version of the SykoLLM series. Developed and trained entirely by Burak (15 years old), this model is designed to explore "Chain of Thought" (CoT) capabilities in small-scale Turkish Language Models.
Important Technical Distinction
**This model is NOT a LoRA adapter or a simple copy of GPT-2. It is a standalone, full-parameter fine-tuned model where the actual weights have been modified through training. The positional embeddings were manually expanded from 512 to 1024 tokens via a custom "weight surgery" process to support longer context natively.
β οΈ Important: Beta Status
This model is currently in a strict Beta phase.
- The training for the "thinking" mechanism is still ongoing/experimental.
- Note: The model has been introduced to new special tokens, but it has not yet fully mastered the logic of "thinking" before answering.
- Users should expect inconsistent results regarding the use of
<thinking>tags during this stage.
Model Specifications
- Model Name: SykoLLM-V2.4-Thinking-Beta
- Parameter Count: ~96.1M params (Lightweight and fast)
- Vocabulary Size: 50,000 (Custom tokenizer optimized for Turkish)
- Context Window: 1024 Tokens (Expanded from the original 512 via positional embedding surgery)
- Architecture: GPT-2 based with modified positional embeddings to support longer context.
Experimental "Thinking" Tokens
Special tokens have been added to the tokenizer to prepare the model for reasoning tasks:
<thinking>: Intended for the model's internal reasoning process.</thinking>: End of the reasoning process.<bos>/<eos>: Beginning and end of string tokens.
π Training Insights
The model was pre-trained on Turkish Wikipedia and fine-tuned on instruction datasets.
- Learning Rate: 2e-5
- Optimizer: AdamW with a Cosine Scheduler.
- Batch Size: 32 (Effective batch size via Gradient Accumulation)
- Loss Trend: Started at ~8.0 and successfully converged to ~3.5 during current training runs.
About the Developer
SykoLLM-V2.4 is part of an ongoing project by Burak a AI enthusiast. The goal of this project is to demonstrate that small-scale models (under 100M parameters) can be fine-tuned to handle complex Turkish language structures and reasoning patterns.
License
Apache 2.0
- Downloads last month
- 56
Model tree for syko818121/SykoLLM-V2.4-Thinking-Beta
Base model
syko818121/SykoLLM-V2.3-Turkish-Instruct