📖 The AI Tool Bible

Best AI tools for speech to text

21 tools in the Audio category, filtered to speech to text.

All Audio

AI Song Maker

Audio · Multi-model (ACE-Step, MusicGen, DiffRhythm, Riffusion)

Browser-based song generator that wraps multiple open music models behind a single freemium UI.

Freemium· Free: 4 songs/day anon, 20 credits/day signed-in; paid tier for moretext-to-songlyrics-generation

Audify AI

Audio · OpenAI TTS (tts-1, tts-1-hd, gpt-4o-mini-tts)

Pay-as-you-go web wrapper around OpenAI's text-to-speech voices.

Freemium· BYO OpenAI key (free); or top up from $2 pay-per-usetext-to-speechvoiceover

AudioCraft

Audio · MusicGen, AudioGen, EnCodec

Meta's open-source research toolkit for generating music and sound effects from text via a single autoregressive language model.

Free· Free and open source; self-hostedtext-to-musicsound-effects

Azure AI Speech (Neural TTS)

Audio · Azure Neural TTS (plus HD and Azure OpenAI voices)

Microsoft's enterprise-grade neural text-to-speech with 100+ languages, custom brand voices, and SSML control.

Freemium· Free tier (0.5M chars/mo neural); pay-as-you-go per character thereaftertext-to-speechvoice-cloning

Deepgram

Audio · Nova, Flux, Speak (proprietary)

Production-grade speech-to-text, text-to-speech, and voice-agent APIs for real-time and batch audio.

Freemium· Free credits on signup; usage-based pricing; enterprise contracts availablespeech-to-texttext-to-speech

Dia

Audio · Dia-1.6B

Open-weights 1.6B text-to-dialogue model that generates ultra-realistic multi-speaker conversations in one pass.

Free· Free, open weights (Apache 2.0); hosted larger version waitlisteddialogue-generationvoice-cloning

Hume AI

Audio · Octave, EVI, TADA

Emotionally intelligent voice AI with expressive TTS, speech-to-speech, and human-feedback evaluation APIs.

Freemiumexpressive-ttsvoice-cloning

LOVO AI

Audio · Proprietary (LOVO Pro V2 voices)

Text-to-speech and voice cloning platform with 500+ voices, an integrated video editor, and a developer API.

Freemium· 14-day free Pro trial, no credit card; paid subscription tierstext-to-speechvoice-cloning

Loudly

Audio · Proprietary Loudly AI

AI music generator with royalty-free output, stem splitting, and distribution to Spotify and friends.

Freemium· Free tier; paid plans on /music/pricingtext-to-musicroyalty-free background music

MockingBird

Audio · GE2E + Tacotron + HiFi-GAN/WaveRNN/Fre-GAN

Open-source Mandarin-first voice cloning that mimics a speaker from a 5-second sample.

Free· Free, open source (MIT)voice-cloningtext-to-speech

Murf AI

Audio · Murf Gen2 / Murf Falcon

Studio-grade text-to-speech and real-time voice agents with 200+ voices across 35+ languages.

Freemium· Free Studio (10 min/mo); paid plans + API at ~$0.01/min (Falcon)text-to-speechvoice-cloning

Remusic

Audio · Remusic V4 Pro (proprietary)

All-in-one AI music studio that bundles song generation, voice cloning, stem splitting, and karaoke tools.

Freemium· Free daily credits; Starter $49/yr, Basic $94/yr, Pro $249/yrtext-to-musicvoice-cloning

Respeecher

Audio · Proprietary Respeecher voice models

Studio-grade AI voice cloning and TTS used by Hollywood productions for speech-to-speech and dubbing work.

Freemium· Free trial; TTS API $2/hour pay-as-you-go; custom enterprise pricing for voice cloningvoice-cloningtext-to-speech

Sesame

Audio · Sesame CSM (1B / 3B / 8B)

Conversational voice AI aiming to cross the uncanny valley with context-aware, emotionally aware speech.

Free· Free research preview; consumer product pricing not announcedconversational-voicetext-to-speech

Veritone Voice

Audio · Proprietary (Veritone aiWARE)

Enterprise-grade voice cloning and synthesis platform built for broadcasters, studios, and large media operations.

Enterprise· Contact sales / demo onlyvoice-cloningtext-to-speech

Voicebox

Audio · Multi-model (Chatterbox, Qwen TTS, Whisper, etc.)

Open-source desktop voice studio for local cloning, dictation, and giving MCP agents a voice.

Free· Free and open source; optional $VOICEBOX token donationsvoice-cloningtext-to-speech

WellSaid

Audio · Proprietary WellSaid TTS

Enterprise-grade AI text-to-speech built on licensed voice actor recordings.

Freemium· Free trial; paid plans for teams and enterprise (contact sales for API)text-to-speeche-learning narration

WhisperAPI

Audio · OpenAI Whisper

Hosted OpenAI Whisper transcription with a pay-as-you-go API and drop-in web dashboard.

Paid· Pay-as-you-go credits; $5 for 20 credits, down to ~$0.10/credit in bulkaudio-transcriptionvideo-subtitles

Wispr Flow

Audio

System-wide voice-to-text dictation that auto-edits filler words and learns your jargon.

Freemium· 14-day Pro trial; paid individual and enterprise plansdictationvoice-to-text

ZenMic

Audio

Text-to-podcast generator with multi-speaker AI voices and RSS publishing.

Freemium· Free 10 min trial; $19/mo or $99/yr (100 min/mo)text-to-podcastcontent-repurposing

iSpeech

Audio

Veteran cloud TTS and speech recognition API with broad SDK and language coverage.

Freemium· Free mobile SDK for non-revenue apps; ~$0.0001-$0.05 per word/transactiontext-to-speechspeech-recognition