MLX-Audio: Text-to-Speech for Apple Silicon

MLX-Audio is an all-in-one speech processing library built on Apple’s MLX framework, offering text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) capabilities. Optimized for Apple Silicon (M1/M2 chips), it delivers high-speed, low-latency audio synthesis and recognition directly on-device. Designed for simplicity and performance, MLX-Audio enables developers to integrate advanced speech features with minimal code effort.

MLX-Audio: Text-to-Speech for Apple Silicon

How to Use

Official Site: Visit the GitHub repository at https://github.com/Blaizzy/mlx-audio for source code and documentation.

Install:

CLI Usage:

Python API:

Web & API Server:
```
mlx_audio.server --host 0.0.0.0 --port 8000 --verbose
```
Navigate to http://127.0.0.1:8000 to access the interactive 3D visualization interface.

Key Features

Unified Speech Toolkit: Combines TTS, STT, and STS in one package for seamless multi-modal applications.
Multi-Language & Voices: Includes Kokoro models supporting American English, British English, Japanese, Mandarin, and custom voice cloning.
Adjustable Speed: Speech rate adjustable from 0.5× to 2.0× to fit any use case.
3D Audio Visualization: Real-time frequency-reactive 3D orb powered by Three.js for engaging demos.
RESTful API: Endpoints for /tts, /audio/{filename}, /play, /stop, and /open_output_folder for quick backend integration.
Model Quantization: Built-in tools for 8-bit quantization to reduce memory footprint and increase inference speed.

Use Cases

Podcast & Audiobook Production: High-fidelity TTS to enhance listener engagement.
Intelligent Assistants: Combine STT and STS to build natural-sounding conversational agents.
Accessibility Solutions: Provide on-device speech-to-text and text-to-speech for visually impaired users.
Language Learning Platforms: Leverage multiple voices and speed control for listening and pronunciation practice.
Educational Demos & Presentations: Use real-time 3D audio visualization to illustrate audio processing concepts.