-
AhiskaAI/AhiskaAI-25m-Base-v0.1
Text Generation • 50.5M • Updated • 42 • 1 -
AhiskaAI/AhiskaAI-125m-Base-v0.1
Text Generation • 0.1B • Updated • 20 • 1 -
AhiskaAI/AhiskaAI-25m-Chat-v0.1-Experimental
Text Generation • 51.2M • Updated • 80 • 1 -
AhiskaAI/AhiskaAI-65m-base-v0.1
Text Generation • 70.7M • Updated • 1
AhıskaAI
AI & ML interests
LLMs & SLMs (Large & Small Language Models) NLP (Natural Language Processing) OCR (Optical Character Recognition) Pre-training & Fine-tuning Data Engineering / Data Preprocessing
Recent Activity
AhıskaAI
An independent, open-source AI research lab specializing in Small Language Models (SLMs), custom tokenizers, robust OCR architectures, and highly curated niche datasets. Built from the ground up with deep technical curiosity.
🧠 Our Approach: "Fail Forward" & Open Code
We believe true machine learning engineering happens through transparency. Instead of only showing perfected weights, AhıskaAI documents the entire lifecycle of model development.
Our workspace is organized into explicit tracks:
- Base Models: Architectures trained from scratch using our custom-built BPE tokenizers.
- Fine-Tuned Models: Production-ready SLMs optimized for specific context-driven tasks (translation, historical synthesis, and niche NLP).
- Curated Datasets: Cleaned, structured data pipelines (including synthetic optimization and conversational formatting).
- Failed Models: Our explicit log of failed training runs, gradient explosions, and alignment experiments. We publish our mistakes so the community can learn from them.
🛠️ Tech Stack & Focus
- Architectures: Custom Transformer SLMs (24M to 125M+ parameters)
- NLP Pipelines: Custom BPE Tokenization, Synthetically Enhanced Datasets (ShareGPT/Alpaca formats)
- Computer Vision: High-frequency OCR models trained for localized data extraction and captcha bypasses
- Methodologies: From-scratch Pre-training, Supervised Fine-Tuning (SFT)
🌍 Identity & Mission
Named in honor of the Ahıska Turks, our long-term roadmap focuses on bridging state-of-the-art deep learning with cultural and historical preservation—bringing heritage and accurate documentation into the open-source digital landscape.
Driven by passion. Powered by local compute.
-
AhiskaAI/AhiskaAI-25m-Base-v0.1
Text Generation • 50.5M • Updated • 42 • 1 -
AhiskaAI/AhiskaAI-125m-Base-v0.1
Text Generation • 0.1B • Updated • 20 • 1 -
AhiskaAI/AhiskaAI-25m-Chat-v0.1-Experimental
Text Generation • 51.2M • Updated • 80 • 1 -
AhiskaAI/AhiskaAI-65m-base-v0.1
Text Generation • 70.7M • Updated • 1