README / README.md
mihainadas's picture
Update README.md
26899ce verified
<!-- KlusAI β€’ Hugging Face Org Card -->
<p align="center">
<strong>KlusAI</strong><br>
<em>Where AI research meets real-world impact</em>
</p>
<p align="center">
<a href="https://www.klusai.com">
<img src="https://img.shields.io/badge/Website-klusai.com-blue?logo=google-chrome&logoColor=white" alt="Website">
</a>
<a href="https://github.com/klusai">
<img src="https://img.shields.io/badge/GitHub-@klusai-black?logo=github" alt="GitHub">
</a>
<a href="https://x.com/klusai">
<img src="https://img.shields.io/badge/X-@klusai-black?logo=x&logoColor=white" alt="X">
</a>
<a href="https://www.klusai.com/research/">
<img src="https://img.shields.io/badge/Research-klusai.com-brightgreen?logo=beaker&logoColor=white" alt="Research">
</a>
</p>
---
## πŸ” What We're About
KlusAI bridges the gap between cutting-edge AI research and production systems. We publish our datasets and models openly to advance the field β€” **9M+ synthetic training examples** and counting.
**Research Themes:**
- 🧬 **Synthetic Data Generation** β€” Large-scale training data without privacy concerns
- ⚑ **Efficient AI Systems** β€” Models that run on consumer hardware
- 🌍 **Multilingual NLP** β€” With deep Romanian language expertise
---
## πŸ“„ Featured Publication
### Synthetic Data Generation Using Large Language Models
*Advances in Text and Code* β€” **IEEE Access, 2025**
Our comprehensive survey on generating training data using LLMs. How enterprises can generate training data at scale β€” reducing annotation costs, addressing data scarcity, and enabling fine-tuning without exposing sensitive data.
πŸ“– [Read on IEEE Xplore](https://ieeexplore.ieee.org/abstract/document/11080380) Β· πŸ“ [arXiv Preprint](https://arxiv.org/abs/2503.14023)
---
## πŸ”¬ Flagship Project: TinyFabulist
**TinyFabulist** is our open research programme on large-scale synthetic narrative generation. We demonstrate that small, efficient models can produce high-quality training data at scale.
| Release | Description | Size |
|---------|-------------|------|
| **TinyFabulist v1** | Synthetic English Fables | ~3M examples |
| *Upcoming* | Multilingual extensions, evaluation benchmarks | β€” |
**Key principles:**
- πŸ“Š **Scale** β€” 9M+ synthetic training examples generated
- πŸ”§ **Efficiency** β€” All content produced with ≀8B parameter models
- πŸ”“ **Openness** β€” Generation scripts, pipelines, and methodology shared publicly
πŸ“„ [Paper (arXiv)](https://arxiv.org/abs/2504.20605) Β· πŸ’» [Code (GitHub)](https://github.com/klusai/tinyfabulist)
---
## πŸ“¦ What You'll Find Here
- **Datasets** β€” Large-scale synthetic training corpora for fine-tuning and research
- **Models** β€” Efficient, instruction-tuned models optimized for specific tasks
- **Evaluation** β€” Benchmarks and tooling for synthetic data quality assessment
---
## 🀝 Work With Us
Beyond open research, we offer enterprise AI services:
| Service | Description |
|---------|-------------|
| **AI Strategy** | Define your AI roadmap and implementation plan |
| **Custom Development** | Bespoke AI solutions tailored to your domain |
| **Model Training** | Fine-tuning and deploying models for your use case |
| **MLOps & Infrastructure** | Scalable pipelines and production deployment |
**Need custom synthetic data or domain-specific models?** We partner with organizations on applied research challenges.
---
## πŸ“« Get in Touch
| Purpose | Contact |
|---------|---------|
| Research collaboration | [research@klusai.com](mailto:research@klusai.com) |
| Enterprise services | [services@klusai.com](mailto:services@klusai.com) |
| General inquiries | [hello@klusai.com](mailto:hello@klusai.com) |
> **Technical questions?** Open an issue on the relevant dataset or model repository.
---
<p align="center">
<strong>Applied Research Β· AI Services Β· Ventures</strong><br>
<a href="https://klusai.com">klusai.com</a> Β· <a href="https://github.com/klusai">GitHub</a> Β· <a href="https://x.com/klusai">X</a>
</p>