📖 The AI Tool Bible

Audio

Voice cloning, music generation, speech-to-text.

48 tools

Why it matters

AI audio has split cleanly into three lanes: speech synthesis (TTS + voice cloning), music generation, and speech-to-text — each with a clear leader.

What's in here

Covers voice cloning and TTS (ElevenLabs, Resemble.ai, Murf), AI music generation (Suno, Udio), and speech-to-text (AssemblyAI, Whisper).

How to pick

Pick ElevenLabs for voice quality. Pick Suno or Udio for AI music. Pick AssemblyAI when you need diarisation and timestamps; pick Whisper when you can self-host and want zero cost.

ElevenLabs

Featured
Audio · ElevenLabs Multilingual v2
9.4

The gold standard for AI voice cloning and TTS.

Freemium· Free 10k chars/mo; from $5/mo Starter; up to $1320/mo ScaleTTSvoice cloning

Suno

Featured
Audio · Suno v4
9.2

Text-to-song AI — full vocal tracks from a prompt.

Freemium· Free credits; Pro $10/mo; Premier $30/mosongwritingdemos

Udio

Audio · Udio (proprietary)
8.8

Suno's main rival for AI-generated full songs.

Freemium· Free; Standard $10/mo; Pro $30/mofull songsmusic demos

AssemblyAI

Audio · Universal / Slam-1
8.7

Speech-to-text API with diarisation, summarisation, and topic detection.

Freemium· Free credits; pay-per-use from $0.37/hrtranscriptiondiarisation

Whisper

Audio · Whisper large-v3
8.6

OpenAI's open-source speech-to-text — the de-facto baseline.

Free· Free open weights; $0.006/min via OpenAI APItranscriptionself-hosted

Resemble.ai

Audio · Resemble v2 / Localize
8.0

Enterprise voice cloning with deepfake-detection layer.

Paid· From $19/mo Creator; enterprise customenterprise voice cloningcompliance

Murf

Audio · Murf Gen2
7.8

TTS aimed at corporate voiceover and e-learning.

Freemium· Free preview; from $19/mo Creator; $66/mo Businessvoiceovere-learning

AI Song Maker

Audio · Multi-model (ACE-Step, MusicGen, DiffRhythm, Riffusion)

Browser-based song generator that wraps multiple open music models behind a single freemium UI.

Freemium· Free: 4 songs/day anon, 20 credits/day signed-in; paid tier for moretext-to-songlyrics-generation

AIVA

Audio · Proprietary (AIVA)

AI music composition tool that generates royalty-friendly tracks in 250+ styles with editable MIDI output.

Freemium· Free; Standard ~€11/mo, Pro ~€33/mo (billed yearly)music-generationsoundtrack-composition

AInterview

Audio

AI host that interviews you and turns the conversation into a finished podcast.

Freemium· Free 10 min/mo; Premium $19/mo (2 hrs); pay-as-you-go $0.13-$0.20/minai-podcast-hostsolo-podcasting

Audify AI

Audio · OpenAI TTS (tts-1, tts-1-hd, gpt-4o-mini-tts)

Pay-as-you-go web wrapper around OpenAI's text-to-speech voices.

Freemium· BYO OpenAI key (free); or top up from $2 pay-per-usetext-to-speechvoiceover

AudioCraft

Audio · MusicGen, AudioGen, EnCodec

Meta's open-source research toolkit for generating music and sound effects from text via a single autoregressive language model.

Free· Free and open source; self-hostedtext-to-musicsound-effects

Azure AI Speech (Neural TTS)

Audio · Azure Neural TTS (plus HD and Azure OpenAI voices)

Microsoft's enterprise-grade neural text-to-speech with 100+ languages, custom brand voices, and SSML control.

Freemium· Free tier (0.5M chars/mo neural); pay-as-you-go per character thereaftertext-to-speechvoice-cloning

Beatoven.ai

Audio · Maestro (proprietary)

Text-to-music generator that spits out royalty-free background tracks with a clean licensing story.

Freemium· Pay-per-track or subscription for download minutesbackground-musicsound-effects

Boomy

Audio · Proprietary (undisclosed)

Generative AI music maker that lets anyone produce a song in under a minute and push it to Spotify.

Freemium· Free tier; Creator ~$9.99/mo; Pro ~$29.99/moai-music-generationsong-creation

CustomPod

Audio

Turns your chosen news sources, RSS feeds, and inboxes into a personalized daily AI podcast.

Freemium· Free tier (manual generation); Pro $4.99/mopersonal podcastnews briefing

Deepgram

Audio · Nova, Flux, Speak (proprietary)

Production-grade speech-to-text, text-to-speech, and voice-agent APIs for real-time and batch audio.

Freemium· Free credits on signup; usage-based pricing; enterprise contracts availablespeech-to-texttext-to-speech

Dia

Audio · Dia-1.6B

Open-weights 1.6B text-to-dialogue model that generates ultra-realistic multi-speaker conversations in one pass.

Free· Free, open weights (Apache 2.0); hosted larger version waitlisteddialogue-generationvoice-cloning

EKHOS AI

Audio · Proprietary local models

Offline Windows transcription app with speaker diarization, GPU acceleration, and 98-language support.

Freemium· Free tier; Premium $9/motranscriptionspeaker-diarization

Ecrett Music

Audio

AI background-music generator that spits out royalty-free instrumental tracks by scene, mood, and genre.

Freemium· Free preview tier; Individual $4.99/mo annual ($7.99 monthly); Business $14.99/mo annual ($24.99 monthly)background-musicyoutube-soundtracks

ElevenLabs Conversational AI

Audio · ElevenLabs Scribe (ASR) + pluggable LLM + ElevenLabs TTS

Production-grade voice agent platform layering ElevenLabs TTS, ASR, and LLM orchestration into a single deployable stack.

Paid· Usage-based; contact sales for enterprisevoice-agentsivr-replacement

Fireflies.ai

Audio · Multi-model (proprietary ASR + LLM layer)

AI meeting assistant that joins calls, transcribes them, and turns the talk into searchable notes and action items.

Freemium· Free tier; paid plans roughly $10-$39/user/mo, Enterprise on requestmeeting transcriptioncall summaries

Harmonai

Audio · Dance Diffusion / Stable Audio family

Open-source generative audio lab from Stability AI building diffusion models for music production.

Free· Free open-source models and code; no hosted product on this sitemusic-generationsound-design

Hume AI

Audio · Octave, EVI, TADA

Emotionally intelligent voice AI with expressive TTS, speech-to-speech, and human-feedback evaluation APIs.

Freemiumexpressive-ttsvoice-cloning