SykoSLM
/

SykoLLM-V2.4-Thinking-Beta

Model card Files Files and versions

SykoLLM-V2.4-Thinking-Beta / README.md

burak

Update README.md

856676b verified 3 months ago

|

history blame contribute delete

2.68 kB

	---
	language:
	- tr
	license: apache-2.0
	tags:
	- gpt2
	- turkish
	- instruct
	- thinking
	model_name: SykoLLM-V2.4-Thinking-Beta
	base_model: syko818121/SykoLLM-V2.3-Turkish-Instruct
	model_type: causal-lm
	parameters: ~96.1M params
	datasets:
	- Quardo/wikipedia-turkish-qa-chattemplate
	---

	# SykoLLM-V2.4-Thinking-Beta

	This is the latest and most experimental version of the SykoLLM series. Developed and trained entirely by Burak (15 years old), this model is designed to explore "Chain of Thought" (CoT) capabilities in small-scale Turkish Language Models.

	## Important Technical Distinction
	This model is NOT a LoRA adapter or a simple copy of GPT-2.** It is a standalone, full-parameter fine-tuned model where the actual weights have been modified through training. The positional embeddings were manually expanded from 512 to 1024 tokens via a custom "weight surgery" process to support longer context natively.

	## ⚠️ Important: Beta Status
	This model is currently in a strict Beta phase.
	- The training for the "thinking" mechanism is still ongoing/experimental.
	- Note: The model has been introduced to new special tokens, but it has not yet fully mastered the logic of "thinking" before answering.
	- Users should expect inconsistent results regarding the use of `<thinking>` tags during this stage.

	## Model Specifications
	- Model Name: SykoLLM-V2.4-Thinking-Beta
	- Parameter Count: ~96.1M params (Lightweight and fast)
	- Vocabulary Size: 50,000 (Custom tokenizer optimized for Turkish)
	- Context Window: 1024 Tokens (Expanded from the original 512 via positional embedding surgery)
	- Architecture: GPT-2 based with modified positional embeddings to support longer context.

	## Experimental "Thinking" Tokens
	Special tokens have been added to the tokenizer to prepare the model for reasoning tasks:
	- `<thinking>`: Intended for the model's internal reasoning process.
	- `</thinking>`: End of the reasoning process.
	- `<bos>` / `<eos>`: Beginning and end of string tokens.

	## 📊 Training Insights
	The model was pre-trained on Turkish Wikipedia and fine-tuned on instruction datasets.
	- Learning Rate: 2e-5
	- Optimizer: AdamW with a Cosine Scheduler.
	- Batch Size: 32 (Effective batch size via Gradient Accumulation)
	- Loss Trend: Started at ~8.0 and successfully converged to ~3.5 during current training runs.

	## About the Developer
	SykoLLM-V2.4 is part of an ongoing project by Burak a AI enthusiast. The goal of this project is to demonstrate that small-scale models (under 100M parameters) can be fine-tuned to handle complex Turkish language structures and reasoning patterns.

	## License
	Apache 2.0