Gemma Turkish Speech Head (E2B + Mimi)

Turkish TTS speech adapter for google/gemma-4-E2B-it with kyutai/mimi neural codec.

Trained on Synthetic_Turkish_TTS_Data (CC BY 4.0).

See: https://github.com/g-hano/gemma-voice

Architecture

  • Frozen backbone: Gemma 4 E2B-it (text conditioning)
  • Frozen codec: Kyutai Mimi (8 codebooks @ 12.5 Hz, 24 kHz)
  • Trainable: learned layer-mix (last 6 Gemma layers) + autoregressive cross-attention speech decoder
  • Training steps: 12000

Quick start

pip install torch transformers accelerate soundfile huggingface_hub
huggingface-cli login   # Gemma 4 is gated — accept license on HF first

git clone https://github.com/g-hano/gemma-voice.git
cd gemma-voice
pip install -e .
cd src
from pathlib import Path
from huggingface_hub import snapshot_download
import soundfile as sf

repo = snapshot_download("Chan-Y/gemma4-turkish-speech-e2b-mimi")
# clone this repo or copy gemma_turkish package next to your script, then:
import sys
sys.path.insert(0, str(Path(repo) / "src"))

from gemma_turkish.speech.config import SpeechTrainConfig
from gemma_turkish.speech.model import GemmaSpeechModel
import json, torch

repo = Path(repo)
cfg = SpeechTrainConfig.from_dict(json.loads((repo / "config.json").read_text()))
model = GemmaSpeechModel(cfg)
GemmaSpeechModel.load_trainable_checkpoint(model, repo / "speech_head.pt")
model = model.cuda().eval()

text = "Merhaba, bu bir Türkçe ses sentezi denemesidir."
wave = model.synthesize(text)
sf.write("out.wav", wave.squeeze().numpy(), cfg.mimi_sample_rate)

Or use the bundled script after downloading the repo:

python inference.py -t "Merhaba dünya."

Files

File Description
speech_head.pt Merged trainable weights (layer_mix + speech_head) + embedded config
config.json Full training/inference hyperparameters
src/gemma_turkish/ Model loading & synthesis code

License

  • Speech-head weights: same terms as base Gemma model (see Google Gemma license).
  • Training data: CC BY 4.0 (Synthetic Turkish TTS dataset).
  • Mimi codec: Kyutai license.
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chan-Y/gemma4-turkish-speech-e2b-mimi

Finetuned
(257)
this model