maya1-mlx-8bit

MLX 8-bit quantized conversion of maya-research/maya1 for text-to-speech on Apple Silicon. Converted using mlx-lm.

Recommended variant. Best balance of quality and speed — near real-time on M4 Max. See also: bf16 (highest quality) and 4-bit (fastest).

Benchmarks (M4 Max, 36GB)

Variant Size Tokens/s Real-time factor
bf16 6.2 GB ~51 tok/s 0.50x
8-bit 3.3 GB ~91 tok/s 0.82x
4-bit 1.8 GB ~108 tok/s 1.58x

Quick Start

Requires macOS with Apple Silicon and uv.

# From a text file
uv run tts.py input.txt -o output.wav

# From stdin
echo "Hello world" | uv run tts.py - -o hello.wav

# With a custom voice description
uv run tts.py input.txt -o output.wav -d "Deep male voice, British accent, slow pacing."

CLI Options

Option Default Description
input (required) Input text file path, or - for stdin
-o, --output output.wav Output WAV file path
-d, --description Calm/clear female voice Voice description prompt
-m, --model . Path to MLX model directory
--max-chars 200 Max characters per chunk
--max-tokens 2048 Max tokens per chunk
--temperature 0.4 Sampling temperature
--top-p 0.9 Top-p sampling
--repetition-penalty 1.1 Repetition penalty

Requirements

  • macOS with Apple Silicon (M1 or later)
  • uv (dependencies are declared inline via PEP 723)
Downloads last month
8
Safetensors
Model size
0.9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xhb/maya1-mlx-8bit

Quantized
(8)
this model