File size: 3,105 Bytes

424c56c
 
 
 
 
 
 
 
741d8be
424c56c
 
 
741d8be
 
424c56c
 
 
 
4a29310
424c56c
4a29310
424c56c
ee2430c
36f6d36
 
 
 
741d8be
4a29310
741d8be
4a29310
741d8be
4a29310
741d8be
4a29310
741d8be
ee2430c
424c56c
4a29310
741d8be
4a29310
 
 
 
 
 
 
ee2430c
424c56c
4a29310
424c56c
4a29310
09b5e55
4a29310
 
424c56c
4a29310
424c56c
4a29310
424c56c
4a29310
 
 
 
 
 
 
 
424c56c
4a29310
424c56c
741d8be
 
424c56c
 
 
741d8be
424c56c
4a29310
424c56c
ee2430c
424c56c
ee2430c
424c56c
ee2430c
424c56c
741d8be
424c56c
4a29310

---
language:
- en
license: other
library_name: pytorch
pipeline_tag: text-to-speech
tags:
- text-to-speech
- streaming-tts
- sparse-attention
- low-rank
- cpu-first
- mesko-tts
- mesklintech
datasets:
- keithito/lj_speech
---

# Mesko TTS

Mesko TTS is MesklinTech's dedicated text-to-speech research project.

We are actively training Mesko TTS as a fast, streaming-oriented speech model. This repository currently shares the architecture and training code while the full voice system continues to improve.

MesklinTech is open to collaboration with researchers, engineers, product teams, and supporters who are interested in efficient real-time speech AI. To connect with us or support the work, visit:

**https://mesklintech.com**

## Mission

MesklinTech is building practical AI systems from first principles: compact, efficient, understandable models that can run outside large-lab infrastructure. Mesko TTS is our speech effort: a fast, streaming-oriented TTS stack designed around sparse routing, explicit acoustic control, and low-latency inference.

Our goal is to build a world-class fast streaming TTS system for real-time assistants, accessibility products, education tools, creator workflows, and business voice interfaces.

## Current Status

Status: **untrained architecture release / training in progress**

What is available now:

- TTS architecture source code
- sparse semantic encoder
- speaker encoder
- duration, pitch, and energy predictors
- sparse acoustic decoder
- sparse neural vocoder code
- LJSpeech training scripts and config structure
- no trained model weights are attached to this repository yet

What is not ready yet:

- production-quality speech checkpoint
- production-grade trained neural vocoder release
- standardized MOS / WER / speaker-similarity benchmark
- long-form streaming quality validation

## Architecture Direction

Mesko TTS is built around:

- low-rank Q/K/V projections
- causal sparse candidate attention
- local, memory, landmark, and content candidate routing
- laminar excitatory/inhibitory refinement
- explicit speaker conditioning
- explicit duration, pitch, and energy modeling
- compact acoustic decoding
- streaming-oriented state/cache structure

The intended model path is:

1. Reference mel -> speaker encoder
2. Text tokens -> sparse semantic encoder
3. Semantic states + speaker latent -> FiLM conditioning
4. Duration predictor -> length regulation
5. Pitch and energy predictors -> frame-level controls
6. Frame states + speaker + pitch + energy -> sparse acoustic decoder
7. Acoustic energy/gating head -> mel spectrogram
8. Trained neural vocoder -> waveform

## Weights

No trained weights are attached to this repository yet.

Until a full text-to-mel and vocoder training run is complete, this repository should be treated as source code and architecture documentation, not as a finished voice model.

## Responsible Use

Do not use this project to impersonate people, clone voices without consent, commit fraud, or create misleading audio. Voice technology should be built and used with permission, transparency, and care.