Audio
Voice cloning, music generation, speech-to-text.
48 tools
AI audio has split cleanly into three lanes: speech synthesis (TTS + voice cloning), music generation, and speech-to-text — each with a clear leader.
Covers voice cloning and TTS (ElevenLabs, Resemble.ai, Murf), AI music generation (Suno, Udio), and speech-to-text (AssemblyAI, Whisper).
Pick ElevenLabs for voice quality. Pick Suno or Udio for AI music. Pick AssemblyAI when you need diarisation and timestamps; pick Whisper when you can self-host and want zero cost.
ElevenLabs
FeaturedThe gold standard for AI voice cloning and TTS.
Suno
FeaturedText-to-song AI — full vocal tracks from a prompt.
Udio
Suno's main rival for AI-generated full songs.
AssemblyAI
Speech-to-text API with diarisation, summarisation, and topic detection.
Whisper
OpenAI's open-source speech-to-text — the de-facto baseline.
Resemble.ai
Enterprise voice cloning with deepfake-detection layer.
Murf
TTS aimed at corporate voiceover and e-learning.
AI Song Maker
Browser-based song generator that wraps multiple open music models behind a single freemium UI.
AIVA
AI music composition tool that generates royalty-friendly tracks in 250+ styles with editable MIDI output.
AInterview
AI host that interviews you and turns the conversation into a finished podcast.
Audify AI
Pay-as-you-go web wrapper around OpenAI's text-to-speech voices.
AudioCraft
Meta's open-source research toolkit for generating music and sound effects from text via a single autoregressive language model.
Azure AI Speech (Neural TTS)
Microsoft's enterprise-grade neural text-to-speech with 100+ languages, custom brand voices, and SSML control.
Beatoven.ai
Text-to-music generator that spits out royalty-free background tracks with a clean licensing story.
Boomy
Generative AI music maker that lets anyone produce a song in under a minute and push it to Spotify.
CustomPod
Turns your chosen news sources, RSS feeds, and inboxes into a personalized daily AI podcast.
Deepgram
Production-grade speech-to-text, text-to-speech, and voice-agent APIs for real-time and batch audio.
Dia
Open-weights 1.6B text-to-dialogue model that generates ultra-realistic multi-speaker conversations in one pass.
EKHOS AI
Offline Windows transcription app with speaker diarization, GPU acceleration, and 98-language support.
Ecrett Music
AI background-music generator that spits out royalty-free instrumental tracks by scene, mood, and genre.
ElevenLabs Conversational AI
Production-grade voice agent platform layering ElevenLabs TTS, ASR, and LLM orchestration into a single deployable stack.
Fireflies.ai
AI meeting assistant that joins calls, transcribes them, and turns the talk into searchable notes and action items.
Harmonai
Open-source generative audio lab from Stability AI building diffusion models for music production.
Hume AI
Emotionally intelligent voice AI with expressive TTS, speech-to-speech, and human-feedback evaluation APIs.