Spaces:

AhiskaAI
/

README

Running

App Files Files Community

README / README.md

YunusEmreSaidoglu

Update README.md

016caa7 verified 9 days ago

preview code

Raw

History Blame Contribute Delete

2.18 kB

	---
	title: README
	emoji: 💻
	colorFrom: gray
	colorTo: green
	sdk: static
	pinned: false
	pinned_items:
	- collection: [AhiskaAI/datasets]
	---
	<style>
	body, html, iframe, .cards {
	background-color: #0b0f19 !important;
	}
	h1, h2, h3, strong {
	color: #ffffff !important;
	}
	p, li, span, div {
	color: #f3f4f6 !important;
	}
	hr {
	border-top: 1px solid #1f2937 !important;
	}
	</style>
	# AhıskaAI

	An independent, open-source AI research lab specializing in Small Language Models (SLMs), custom tokenizers, robust OCR architectures, and highly curated niche datasets. Built from the ground up with deep technical curiosity.

	---

	## 🧠 Our Approach: "Fail Forward" & Open Code

	We believe true machine learning engineering happens through transparency. Instead of only showing perfected weights, AhıskaAI documents the entire lifecycle of model development.

	Our workspace is organized into explicit tracks:
	* Base Models: Architectures trained from scratch using our custom-built BPE tokenizers.
	* Fine-Tuned Models: Production-ready SLMs optimized for specific context-driven tasks (translation, historical synthesis, and niche NLP).
	* Curated Datasets: Cleaned, structured data pipelines (including synthetic optimization and conversational formatting).
	* Failed Models: Our explicit log of failed training runs, gradient explosions, and alignment experiments. We publish our mistakes so the community can learn from them.

	---

	## 🛠️ Tech Stack & Focus
	* Architectures: Custom Transformer SLMs (24M to 125M+ parameters)
	* NLP Pipelines: Custom BPE Tokenization, Synthetically Enhanced Datasets (ShareGPT/Alpaca formats)
	* Computer Vision: High-frequency OCR models trained for localized data extraction and captcha bypasses
	* Methodologies: From-scratch Pre-training, Supervised Fine-Tuning (SFT)

	---

	## 🌍 Identity & Mission
	Named in honor of the Ahıska Turks, our long-term roadmap focuses on bridging state-of-the-art deep learning with cultural and historical preservation—bringing heritage and accurate documentation into the open-source digital landscape.

	Driven by passion. Powered by local compute.