Best AI tools for speech to text
21 tools in the Audio category, filtered to speech to text.
AI Song Maker
Browser-based song generator that wraps multiple open music models behind a single freemium UI.
Audify AI
Pay-as-you-go web wrapper around OpenAI's text-to-speech voices.
AudioCraft
Meta's open-source research toolkit for generating music and sound effects from text via a single autoregressive language model.
Azure AI Speech (Neural TTS)
Microsoft's enterprise-grade neural text-to-speech with 100+ languages, custom brand voices, and SSML control.
Deepgram
Production-grade speech-to-text, text-to-speech, and voice-agent APIs for real-time and batch audio.
Dia
Open-weights 1.6B text-to-dialogue model that generates ultra-realistic multi-speaker conversations in one pass.
Hume AI
Emotionally intelligent voice AI with expressive TTS, speech-to-speech, and human-feedback evaluation APIs.
LOVO AI
Text-to-speech and voice cloning platform with 500+ voices, an integrated video editor, and a developer API.
Loudly
AI music generator with royalty-free output, stem splitting, and distribution to Spotify and friends.
MockingBird
Open-source Mandarin-first voice cloning that mimics a speaker from a 5-second sample.
Murf AI
Studio-grade text-to-speech and real-time voice agents with 200+ voices across 35+ languages.
Remusic
All-in-one AI music studio that bundles song generation, voice cloning, stem splitting, and karaoke tools.
Respeecher
Studio-grade AI voice cloning and TTS used by Hollywood productions for speech-to-speech and dubbing work.
Sesame
Conversational voice AI aiming to cross the uncanny valley with context-aware, emotionally aware speech.
Veritone Voice
Enterprise-grade voice cloning and synthesis platform built for broadcasters, studios, and large media operations.
Voicebox
Open-source desktop voice studio for local cloning, dictation, and giving MCP agents a voice.
WellSaid
Enterprise-grade AI text-to-speech built on licensed voice actor recordings.
WhisperAPI
Hosted OpenAI Whisper transcription with a pay-as-you-go API and drop-in web dashboard.
Wispr Flow
System-wide voice-to-text dictation that auto-edits filler words and learns your jargon.
ZenMic
Text-to-podcast generator with multi-speaker AI voices and RSS publishing.
iSpeech
Veteran cloud TTS and speech recognition API with broad SDK and language coverage.