Miscellaneous Kazakh AI models and datasets — TTS, sentiment analysis, speech, benchmarks
Saken Tukenov PRO
stukenov
AI & ML interests
None yet
Recent Activity
updated a model about 5 hours ago
stukenov/sozkz-core-omniaudio-600m-kk-ctc-v1 published a model about 5 hours ago
stukenov/sozkz-core-omniaudio-600m-kk-ctc-v1 updated a model about 8 hours ago
stukenov/sozkz-core-omniaudio-1b-kk-ctc-v1Organizations
SozKZ Corpora: Kazakh Training Datasets
Training corpora for Kazakh LLMs — raw, cleaned, deduplicated, tokenized, synthetic, and parallel datasets
SozKZ GEC: Kazakh Grammar Error Correction
Grammar error correction models and datasets for Kazakh — Llama GEC (300M, 600M), mT5 GEC, morphology models
-
stukenov/sozkz-core-llama-600m-kk-gec-v1
Text Generation • 0.6B • Updated • 65 -
stukenov/sozkz-core-llama-300m-kk-gec-v1
Text Generation • 0.3B • Updated • 78 -
stukenov/sozkz-core-llama-300m-kk-gec-v2a
Text Generation • 0.3B • Updated • 28 -
stukenov/sozkz-core-llama-300m-kk-gec-v2b
Text Generation • 0.3B • Updated • 27
SozKZ Vocab: Kazakh Tokenizers
BPE and SentencePiece tokenizers trained on Kazakh text — 32K vocabularies
SozKZ MoE: Mixture of Experts
Mixture-of-Experts models for Kazakh — upcycled and domain-pretrained MoE architectures
SozKZ Core: Kazakh Language Models
Base, instruct, and balanced Kazakh language models trained from scratch — Llama (50M–600M), GPT2, Pythia architectures
-
stukenov/sozkz-core-llama-600m-kk-base-v1
Text Generation • 0.6B • Updated • 73 -
stukenov/sozkz-core-llama-600m-kk-instruct-v1
0.6B • Updated • 31 -
stukenov/sozkz-core-llama-300m-kk-base-v1
Text Generation • 0.3B • Updated • 101 -
stukenov/sozkz-core-llama-300m-kk-instruct-v1
Text Generation • 0.3B • Updated • 64
SozKZ Misc: TTS, Sentiment & Other
Miscellaneous Kazakh AI models and datasets — TTS, sentiment analysis, speech, benchmarks
SozKZ Vocab: Kazakh Tokenizers
BPE and SentencePiece tokenizers trained on Kazakh text — 32K vocabularies
SozKZ Corpora: Kazakh Training Datasets
Training corpora for Kazakh LLMs — raw, cleaned, deduplicated, tokenized, synthetic, and parallel datasets
SozKZ MoE: Mixture of Experts
Mixture-of-Experts models for Kazakh — upcycled and domain-pretrained MoE architectures
SozKZ GEC: Kazakh Grammar Error Correction
Grammar error correction models and datasets for Kazakh — Llama GEC (300M, 600M), mT5 GEC, morphology models
-
stukenov/sozkz-core-llama-600m-kk-gec-v1
Text Generation • 0.6B • Updated • 65 -
stukenov/sozkz-core-llama-300m-kk-gec-v1
Text Generation • 0.3B • Updated • 78 -
stukenov/sozkz-core-llama-300m-kk-gec-v2a
Text Generation • 0.3B • Updated • 28 -
stukenov/sozkz-core-llama-300m-kk-gec-v2b
Text Generation • 0.3B • Updated • 27
SozKZ Core: Kazakh Language Models
Base, instruct, and balanced Kazakh language models trained from scratch — Llama (50M–600M), GPT2, Pythia architectures
-
stukenov/sozkz-core-llama-600m-kk-base-v1
Text Generation • 0.6B • Updated • 73 -
stukenov/sozkz-core-llama-600m-kk-instruct-v1
0.6B • Updated • 31 -
stukenov/sozkz-core-llama-300m-kk-base-v1
Text Generation • 0.3B • Updated • 101 -
stukenov/sozkz-core-llama-300m-kk-instruct-v1
Text Generation • 0.3B • Updated • 64