SykoLLM-V2.4-Thinking-Beta

This is the latest and most experimental version of the SykoLLM series. Developed and trained entirely by Burak (15 years old), this model is designed to explore "Chain of Thought" (CoT) capabilities in small-scale Turkish Language Models.

Important Technical Distinction

**This model is NOT a LoRA adapter or a simple copy of GPT-2. It is a standalone, full-parameter fine-tuned model where the actual weights have been modified through training. The positional embeddings were manually expanded from 512 to 1024 tokens via a custom "weight surgery" process to support longer context natively.

⚠️ Important: Beta Status

This model is currently in a strict Beta phase.

  • The training for the "thinking" mechanism is still ongoing/experimental.
  • Note: The model has been introduced to new special tokens, but it has not yet fully mastered the logic of "thinking" before answering.
  • Users should expect inconsistent results regarding the use of <thinking> tags during this stage.

Model Specifications

  • Model Name: SykoLLM-V2.4-Thinking-Beta
  • Parameter Count: ~96.1M params (Lightweight and fast)
  • Vocabulary Size: 50,000 (Custom tokenizer optimized for Turkish)
  • Context Window: 1024 Tokens (Expanded from the original 512 via positional embedding surgery)
  • Architecture: GPT-2 based with modified positional embeddings to support longer context.

Experimental "Thinking" Tokens

Special tokens have been added to the tokenizer to prepare the model for reasoning tasks:

  • <thinking>: Intended for the model's internal reasoning process.
  • </thinking>: End of the reasoning process.
  • <bos> / <eos>: Beginning and end of string tokens.

πŸ“Š Training Insights

The model was pre-trained on Turkish Wikipedia and fine-tuned on instruction datasets.

  • Learning Rate: 2e-5
  • Optimizer: AdamW with a Cosine Scheduler.
  • Batch Size: 32 (Effective batch size via Gradient Accumulation)
  • Loss Trend: Started at ~8.0 and successfully converged to ~3.5 during current training runs.

About the Developer

SykoLLM-V2.4 is part of an ongoing project by Burak a AI enthusiast. The goal of this project is to demonstrate that small-scale models (under 100M parameters) can be fine-tuned to handle complex Turkish language structures and reasoning patterns.

License

Apache 2.0

Downloads last month
56
Safetensors
Model size
96.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for syko818121/SykoLLM-V2.4-Thinking-Beta

Finetuned
(1)
this model

Dataset used to train syko818121/SykoLLM-V2.4-Thinking-Beta