MLX-Audio: Text-to-Speech for Apple Silicon

MLX-Audio is an all-in-one speech processing library built on Apple’s MLX framework, offering text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) capabilities. Optimized for Apple Silicon (M1/M2 chips), it delivers high-speed, low-latency audio synthesis and recognition directly on-device. Designed for simplicity and performance, MLX-Audio enables developers to integrate advanced speech features with minimal code effort.

MLX-Audio: Text-to-Speech for Apple Silicon

How to Use

  1. Official Site: Visit the GitHub repository at https://github.com/Blaizzy/mlx-audio for source code and documentation.

  2. Install:

    pip install mlx-audio
    pip install -r requirements.txt
    
  3. CLI Usage:

    mlx_audio.tts.generate --text "Hello, world" --speed 1.4 --file_prefix hello
  4. Python API:

    from mlx_audio.tts.generate import generate_audio
    generate_audio(
        text="Welcome to MLX-Audio",
        model_path="prince-canuma/Kokoro-82M",
        voice="af_heart",
        speed=1.0,
        file_prefix="welcome",
        audio_format="wav",
        sample_rate=24000,
        join_audio=True,
        verbose=True
    )
  5. Web & API Server:

    mlx_audio.server --host 0.0.0.0 --port 8000 --verbose

    Navigate to http://127.0.0.1:8000 to access the interactive 3D visualization interface.

Key Features

  • Unified Speech Toolkit: Combines TTS, STT, and STS in one package for seamless multi-modal applications.

  • Multi-Language & Voices: Includes Kokoro models supporting American English, British English, Japanese, Mandarin, and custom voice cloning.

  • Adjustable Speed: Speech rate adjustable from 0.5× to 2.0× to fit any use case.

  • 3D Audio Visualization: Real-time frequency-reactive 3D orb powered by Three.js for engaging demos.

  • RESTful API: Endpoints for /tts, /audio/{filename}, /play, /stop, and /open_output_folder for quick backend integration.

  • Model Quantization: Built-in tools for 8-bit quantization to reduce memory footprint and increase inference speed.

Use Cases

  • Podcast & Audiobook Production: High-fidelity TTS to enhance listener engagement.

  • Intelligent Assistants: Combine STT and STS to build natural-sounding conversational agents.

  • Accessibility Solutions: Provide on-device speech-to-text and text-to-speech for visually impaired users.

  • Language Learning Platforms: Leverage multiple voices and speed control for listening and pronunciation practice.

  • Educational Demos & Presentations: Use real-time 3D audio visualization to illustrate audio processing concepts.

Libre Depot original article,Publisher:Libre Depot,Please indicate the source when reprinting:https://www.libredepot.top/5356.html

Like (0)
Libre DepotLibre Depot
Previous 4 days ago
Next 4 days ago

Related articles

Leave a Reply

Your email address will not be published. Required fields are marked *