📖 The AI Tool Bible

Audio

Voice cloning, music generation, speech-to-text.

48 tools

Why it matters

AI audio has split cleanly into three lanes: speech synthesis (TTS + voice cloning), music generation, and speech-to-text — each with a clear leader.

What's in here

Covers voice cloning and TTS (ElevenLabs, Resemble.ai, Murf), AI music generation (Suno, Udio), and speech-to-text (AssemblyAI, Whisper).

How to pick

Pick ElevenLabs for voice quality. Pick Suno or Udio for AI music. Pick AssemblyAI when you need diarisation and timestamps; pick Whisper when you can self-host and want zero cost.

LOVO AI

Audio · Proprietary (LOVO Pro V2 voices)

Text-to-speech and voice cloning platform with 500+ voices, an integrated video editor, and a developer API.

Freemium· 14-day free Pro trial, no credit card; paid subscription tierstext-to-speechvoice-cloning

Loudly

Audio · Proprietary Loudly AI

AI music generator with royalty-free output, stem splitting, and distribution to Spotify and friends.

Freemium· Free tier; paid plans on /music/pricingtext-to-musicroyalty-free background music

MockingBird

Audio · GE2E + Tacotron + HiFi-GAN/WaveRNN/Fre-GAN

Open-source Mandarin-first voice cloning that mimics a speaker from a 5-second sample.

Free· Free, open source (MIT)voice-cloningtext-to-speech

Mubert

Audio · Proprietary sample-based generative engine

AI music generator that spits out royalty-free background tracks for video, podcast, and app use.

Freemium· Free tier; paid plans for commercial use; API via sales demobackground-musicroyalty-free-soundtracks

Murf AI

Audio · Murf Gen2 / Murf Falcon

Studio-grade text-to-speech and real-time voice agents with 200+ voices across 35+ languages.

Freemium· Free Studio (10 min/mo); paid plans + API at ~$0.01/min (Falcon)text-to-speechvoice-cloning

Otter.ai

Audio · Proprietary speech + LLM stack

AI meeting notetaker that transcribes calls, summarizes them, and pulls out action items in real time.

Freemium· Free Basic; Business $19.99/user/mo; Enterprise custommeeting-transcriptionmeeting-summaries

Read AI

Audio · Multi-model

AI meeting copilot that transcribes, summarizes, and surfaces action items across Zoom, Meet, and Teams.

Freemium· Free (5 meetings/mo); paid tiers + Enterprisemeeting-transcriptionmeeting-summaries

Remusic

Audio · Remusic V4 Pro (proprietary)

All-in-one AI music studio that bundles song generation, voice cloning, stem splitting, and karaoke tools.

Freemium· Free daily credits; Starter $49/yr, Basic $94/yr, Pro $249/yrtext-to-musicvoice-cloning

Respeecher

Audio · Proprietary Respeecher voice models

Studio-grade AI voice cloning and TTS used by Hollywood productions for speech-to-speech and dubbing work.

Freemium· Free trial; TTS API $2/hour pay-as-you-go; custom enterprise pricing for voice cloningvoice-cloningtext-to-speech

Scribbl

Audio

Bot-free AI meeting recorder, transcriber, and summarizer for Google Meet.

Freemium· Free: 15 meetings/month; paid team plans for shared libraries and CRM integrationsmeeting-transcriptionmeeting-summaries

Sesame

Audio · Sesame CSM (1B / 3B / 8B)

Conversational voice AI aiming to cross the uncanny valley with context-aware, emotionally aware speech.

Free· Free research preview; consumer product pricing not announcedconversational-voicetext-to-speech

Soundful

Audio · Proprietary (human-aided AI)

Template-driven AI music generator that spits out royalty-free, commercially licensable tracks in seconds.

Freemium· Free tier; Plus/Pro/Business monthly per-user; Enterprise on requestbackground-musiccontent-creator-audio

Soundraw

Audio · Proprietary in-house model

AI music generator that spits out royalty-free, customizable tracks by genre and mood.

Freemium· Free trial; Creator $5.99/mo; Artist Pro $12.59/mo; Unlimited $17.49/mo; Enterprise custombackground musicvideo soundtracks

Stable Audio

Audio · Stable Audio 3.0 (Large/Medium/Small/Small SFX)

Stability AI's generative audio model family for music and sound effects, with open weights for the smaller variants.

Freemium· Free web app tier; API metered; enterprise licensing for Large modelmusic-generationsound-effects

Transgate

Audio · Multi-model speech-to-text

Pay-as-you-go AI transcription and translation with summaries, highlights, and chat over your audio.

Freemium· Free 20-minute trial; pay-as-you-go credit packstranscriptiontranslation

Veritone Voice

Audio · Proprietary (Veritone aiWARE)

Enterprise-grade voice cloning and synthesis platform built for broadcasters, studios, and large media operations.

Enterprise· Contact sales / demo onlyvoice-cloningtext-to-speech

Vibe

Audio · OpenAI Whisper (via whisper.cpp)

Offline desktop transcription app powered by Whisper, with diarization, batch processing, and an HTTP API.

Free· Free and open-source (MIT)transcriptionsubtitles

Voicebox

Audio · Multi-model (Chatterbox, Qwen TTS, Whisper, etc.)

Open-source desktop voice studio for local cloning, dictation, and giving MCP agents a voice.

Free· Free and open source; optional $VOICEBOX token donationsvoice-cloningtext-to-speech

WellSaid

Audio · Proprietary WellSaid TTS

Enterprise-grade AI text-to-speech built on licensed voice actor recordings.

Freemium· Free trial; paid plans for teams and enterprise (contact sales for API)text-to-speeche-learning narration

WellSaid Labs

Audio · Proprietary WellSaid TTS (closed model)

Enterprise AI text-to-speech studio built on licensed voice-actor recordings, with a director-style editor for pacing and pronunciation.

Paid· Subscription plans (Maker/Team/Enterprise); free trial availablee-learning narrationcorporate training

WhisperAPI

Audio · OpenAI Whisper

Hosted OpenAI Whisper transcription with a pay-as-you-go API and drop-in web dashboard.

Paid· Pay-as-you-go credits; $5 for 20 credits, down to ~$0.10/credit in bulkaudio-transcriptionvideo-subtitles

Wispr Flow

Audio

System-wide voice-to-text dictation that auto-edits filler words and learns your jargon.

Freemium· 14-day Pro trial; paid individual and enterprise plansdictationvoice-to-text

ZenMic

Audio

Text-to-podcast generator with multi-speaker AI voices and RSS publishing.

Freemium· Free 10 min trial; $19/mo or $99/yr (100 min/mo)text-to-podcastcontent-repurposing

iSpeech

Audio

Veteran cloud TTS and speech recognition API with broad SDK and language coverage.

Freemium· Free mobile SDK for non-revenue apps; ~$0.0001-$0.05 per word/transactiontext-to-speechspeech-recognition