Spaces:

magibu
/

README

Running

App Files Files Community

alibayram commited on Aug 8, 2025

Commit

0651bdf

verified ·

1 Parent(s): 8c74eb2

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -81

README.md CHANGED Viewed

@@ -19,87 +19,6 @@ Our mission is to combine **linguistics, machine learning, and software engineer
 We actively contribute to the **global AI community** through publications, open datasets, benchmarking platforms, and collaborative projects.
----
-## 🎯 Mission & Vision
-Our goal is to **advance the state of the art** in NLP and AI for low-resource languages by:
-1. **Developing state-of-the-art models** and tools tailored to Turkish and similar languages.
-2. **Creating and maintaining high-quality datasets** and benchmarks to improve transparency and evaluation.
-3. **Fostering collaboration** between academia, industry, and the open-source community.
-4. **Educating the next generation** of NLP researchers in Türkiye and beyond.
-5. **Promoting open science** to accelerate innovation and inclusivity in AI.
----
-## 🧠 Core Research Areas
-- **🔤 Tokenization Research** – Linguistically-informed hybrid tokenizers for agglutinative languages.
-- **🧠 Morphological Tokenizer** – Rule-based, phonetic-aware tokenization with ENCODE/DECODE logic.
-- **📊 Benchmarking & Evaluation** – Turkish MMLU with 6,200+ questions across 62 domains.
-- **🤖 AI Chat Platforms** – Interactive chat environments for LLM deployment in Turkish.
-- **📈 Machine Learning** – Novel algorithms, including data quality-based adaptive learning rates.
-- **📂 Data Science** – Large-scale dataset creation, preprocessing, and analysis for NLP tasks.
----
-## 🚀 Featured Projects
-### 📚 **Turkish MMLU Benchmark**
-- **6200+ questions**, **62 categories** from Turkish academic and professional exams.
-- Original content — *not translated from other languages*.
-- Available on [Hugging Face](https://huggingface.co/datasets/alibayram/turkish_mmlu) & [Zenodo](https://doi.org/10.5281/zenodo.13375018).
-### 🧩 **Hybrid Tokenizer Framework**
-- Morphological + semantic analysis for agglutinative languages.
-- Handles **phonetic transformations** and **shared token IDs** for similar morphemes.
-- Supports ENCODE/DECODE operations with linguistic accuracy.
-### 🏥 **Medical LLM Fine-Tuning**
-- Fine-tuned large language models using **167,000+ Turkish doctor–patient dialogues**.
-- Adaptive learning rate techniques based on **data quality scoring**.
-- Specialized for medical documentation, diagnosis support, and patient interaction.
-### 🐦 **Turkish BERT**
-- Pre-trained transformer for Turkish NLP.
-- Extensive dataset coverage, open-source release, strong downstream task performance.
-### 📊 **Turkish NLP Dataset**
-- High-quality multi-task annotated dataset.
-- Covers **NER**, **sentiment analysis**, **QA**, and **topic classification**.
----
-## 📑 Selected Publications
-- **Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark** — arXiv 2025.
-- **Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation** — arXiv 2025.
-- **Tokens with Meaning: A Hybrid Tokenization Approach for NLP** — Submitted to *Language Resources and Evaluation* (Springer Nature).
-- **Healthcare-Focused Turkish Medical LLM** — Under review at *ACM TALLIP*.
-- **Morphological Tokenization for Agglutinative Languages** — SIU 2025 Conference.
----
-## 🧑‍🤝‍🧑 Team
-Our interdisciplinary team includes:
-- **Ali Bayram** — PhD Candidate, Morphological Tokenizer & NLP Research.
-- **Ali Arda Fincan** — Undergraduate LLM/NLP Researcher.
-- **Ahmet Semih Gümüş** — NLP & AI Applications.
-- **Sercan Karakaş** — AI Reliability & Interpretability.
-- **Demircan Çelik** — NLP Model Deployment.
-- **Yusuf Özdil** — Data Science & Evaluation.
-- **Umut Ertuğrul Daşgın** — Tokenization Research.
-We collaborate with researchers from **Yıldız Technical University**, **Yeditepe University**, **University of Chicago**, **Istanbul Bilgi University**, and others.
----
-## 🌐 Community & Collaboration
-We believe in **open science** and **community-driven research**:
-- Public issue tracking & Kanban boards.
-- Wiki documentation for tools & datasets.
-- Pull request contributions and open peer review.
-- Hugging Face models, datasets, and Spaces.
----
 ## 📬 Contact
 🌐 **Website:** [https://magibu.web.app](https://magibu.web.app)
 🤗 **Hugging Face:** [https://huggingface.co/magibu](https://huggingface.co/magibu)

 We actively contribute to the **global AI community** through publications, open datasets, benchmarking platforms, and collaborative projects.
 ## 📬 Contact
 🌐 **Website:** [https://magibu.web.app](https://magibu.web.app)
 🤗 **Hugging Face:** [https://huggingface.co/magibu](https://huggingface.co/magibu)