File size: 4,047 Bytes
f567db6
 
14df440
26899ce
 
14df440
 
 
 
9a5ccc6
14df440
 
9a5ccc6
14df440
26899ce
 
da0f770
26899ce
 
14df440
 
f567db6
8f2645f
f567db6
26899ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
da0f770
 
 
26899ce
 
 
 
 
 
 
 
f567db6
26899ce
 
 
 
f567db6
26899ce
f567db6
8f2645f
 
26899ce
f567db6
26899ce
 
 
f567db6
 
4a10c03
26899ce
4a10c03
26899ce
9a5ccc6
26899ce
 
 
 
 
 
4a10c03
26899ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
<!-- KlusAI β€’ Hugging Face Org Card -->

<p align="center">
  <strong>KlusAI</strong><br>
  <em>Where AI research meets real-world impact</em>
</p>

<p align="center">
  <a href="https://www.klusai.com">
    <img src="https://img.shields.io/badge/Website-klusai.com-blue?logo=google-chrome&logoColor=white" alt="Website">
  </a>
  <a href="https://github.com/klusai">
    <img src="https://img.shields.io/badge/GitHub-@klusai-black?logo=github" alt="GitHub">
  </a>
  <a href="https://x.com/klusai">
    <img src="https://img.shields.io/badge/X-@klusai-black?logo=x&logoColor=white" alt="X">
  </a>
  <a href="https://www.klusai.com/research/">
    <img src="https://img.shields.io/badge/Research-klusai.com-brightgreen?logo=beaker&logoColor=white" alt="Research">
  </a>
</p>

---

## πŸ” What We're About

KlusAI bridges the gap between cutting-edge AI research and production systems. We publish our datasets and models openly to advance the field β€” **9M+ synthetic training examples** and counting.

**Research Themes:**
- 🧬 **Synthetic Data Generation** β€” Large-scale training data without privacy concerns
- ⚑ **Efficient AI Systems** β€” Models that run on consumer hardware
- 🌍 **Multilingual NLP** β€” With deep Romanian language expertise

---

## πŸ“„ Featured Publication

### Synthetic Data Generation Using Large Language Models
*Advances in Text and Code* β€” **IEEE Access, 2025**

Our comprehensive survey on generating training data using LLMs. How enterprises can generate training data at scale β€” reducing annotation costs, addressing data scarcity, and enabling fine-tuning without exposing sensitive data.

πŸ“– [Read on IEEE Xplore](https://ieeexplore.ieee.org/abstract/document/11080380) Β· πŸ“ [arXiv Preprint](https://arxiv.org/abs/2503.14023)

---

## πŸ”¬ Flagship Project: TinyFabulist

**TinyFabulist** is our open research programme on large-scale synthetic narrative generation. We demonstrate that small, efficient models can produce high-quality training data at scale.

| Release | Description | Size |
|---------|-------------|------|
| **TinyFabulist v1** | Synthetic English Fables | ~3M examples |
| *Upcoming* | Multilingual extensions, evaluation benchmarks | β€” |

**Key principles:**
- πŸ“Š **Scale** β€” 9M+ synthetic training examples generated
- πŸ”§ **Efficiency** β€” All content produced with ≀8B parameter models
- πŸ”“ **Openness** β€” Generation scripts, pipelines, and methodology shared publicly

πŸ“„ [Paper (arXiv)](https://arxiv.org/abs/2504.20605) Β· πŸ’» [Code (GitHub)](https://github.com/klusai/tinyfabulist)

---

## πŸ“¦ What You'll Find Here

- **Datasets** β€” Large-scale synthetic training corpora for fine-tuning and research
- **Models** β€” Efficient, instruction-tuned models optimized for specific tasks
- **Evaluation** β€” Benchmarks and tooling for synthetic data quality assessment

---

## 🀝 Work With Us

Beyond open research, we offer enterprise AI services:

| Service | Description |
|---------|-------------|
| **AI Strategy** | Define your AI roadmap and implementation plan |
| **Custom Development** | Bespoke AI solutions tailored to your domain |
| **Model Training** | Fine-tuning and deploying models for your use case |
| **MLOps & Infrastructure** | Scalable pipelines and production deployment |

**Need custom synthetic data or domain-specific models?** We partner with organizations on applied research challenges.

---

## πŸ“« Get in Touch

| Purpose | Contact |
|---------|---------|
| Research collaboration | [research@klusai.com](mailto:research@klusai.com) |
| Enterprise services | [services@klusai.com](mailto:services@klusai.com) |
| General inquiries | [hello@klusai.com](mailto:hello@klusai.com) |

> **Technical questions?** Open an issue on the relevant dataset or model repository.

---

<p align="center">
  <strong>Applied Research Β· AI Services Β· Ventures</strong><br>
  <a href="https://klusai.com">klusai.com</a> Β· <a href="https://github.com/klusai">GitHub</a> Β· <a href="https://x.com/klusai">X</a>
</p>