📖 The AI Tool Bible

AI tools tagged Supports Audio

48 tools matching this tag.model

All tags →

ElevenLabs

Featured
Audio · ElevenLabs Multilingual v2
9.4

The gold standard for AI voice cloning and TTS.

Freemium· Free 10k chars/mo; from $5/mo Starter; up to $1320/mo ScaleTTSvoice cloning

GPT-4o

Featured
Writing · GPT-4o
9.4

OpenAI's multimodal flagship behind ChatGPT.

Freemium· Free tier; Plus $20/mo; Pro $200/mogeneral writingsummarization

AssemblyAI

Audio · Universal / Slam-1
8.7

Speech-to-text API with diarisation, summarisation, and topic detection.

Freemium· Free credits; pay-per-use from $0.37/hrtranscriptiondiarisation

Whisper

Audio · Whisper large-v3
8.6

OpenAI's open-source speech-to-text — the de-facto baseline.

Free· Free open weights; $0.006/min via OpenAI APItranscriptionself-hosted

Replicate

Fine-tuning · Thousands of community + first-party models
8.5

One-API platform for running and fine-tuning open-source models.

Paid· Pay-per-second of GPU timemodel hostingfine-tuning

HeyGen

Video · HeyGen proprietary
8.3

Avatar video + lip-sync translation at scale.

Paid· Free trial; from $24/mo Creator; $89/mo Teamlocalisationavatar video

Murf

Audio · Murf Gen2
7.8

TTS aimed at corporate voiceover and e-learning.

Freemium· Free preview; from $19/mo Creator; $66/mo Businessvoiceovere-learning

AI/ML API

Coding · Multi-model (GPT, Claude, Gemini, Grok, Nemotron, Qwen, etc.)

Unified API gateway exposing 500+ AI models behind one OpenAI-compatible endpoint.

Freemium· Pay-as-you-go from $20 prepaid credit; enterprise custommulti-model-routingprototyping

AISaver

Video · Multi-model (Nano Banana Pro, Kling AI, Seedance 2.0)

Browser-based face swap and AI video/image generator stitched together from third-party models.

Freemium· Free credits on signup; paid credit top-ups (pricing not displayed on homepage)face-swapimage-to-video

AIVA

Audio · Proprietary (AIVA)

AI music composition tool that generates royalty-friendly tracks in 250+ styles with editable MIDI output.

Freemium· Free; Standard ~€11/mo, Pro ~€33/mo (billed yearly)music-generationsoundtrack-composition

Aispect

Image Generation

Turns live audio at events into AI-generated visuals on the fly, in 30+ languages.

Freemium· 5 free credits; PAYG $12.50/30 credits; Basic $34.90/mo (100 credits); Pro $149.90/mo (500 credits)live-event-visualsspeech-to-image

AnkiDecks AI

Writing

AI flashcard generator that turns PDFs, slides, YouTube videos and handwritten notes into Anki-ready decks.

Freemium· Free plan: 4 decks/month; Pro plan for larger files and unlimited decks (price on pricing page)flashcard-generationspaced-repetition

Argil

Video · Proprietary avatar model; integrates VEO3 and Hailuo for AI Fictions

AI avatar video generator that clones your face and voice from a single photo and a minute of audio.

Paid· 5-day trial; Classic $39/mo, Pro $149/mo, Scale $499/mo, Enterprise customai-avatarstalking-head-video

AudioCraft

Audio · MusicGen, AudioGen, EnCodec

Meta's open-source research toolkit for generating music and sound effects from text via a single autoregressive language model.

Free· Free and open source; self-hostedtext-to-musicsound-effects

Azure AI Speech (Neural TTS)

Audio · Azure Neural TTS (plus HD and Azure OpenAI voices)

Microsoft's enterprise-grade neural text-to-speech with 100+ languages, custom brand voices, and SSML control.

Freemium· Free tier (0.5M chars/mo neural); pay-as-you-go per character thereaftertext-to-speechvoice-cloning

BrainSoup

Agents · Multi-model (ChatGPT, Mistral, local LLMs)

Windows desktop app for building teams of specialized AI agents that collaborate through chat.

Freemium· From $5/month, no commitmentmulti-agent workflowsdesktop AI assistant

Context Data

RAG · Multi-model

Enterprise data platform for deploying private RAG pipelines without infrastructure plumbing.

Enterprise· Contact salesenterprise-ragdocument-search

CustomPod

Audio

Turns your chosen news sources, RSS feeds, and inboxes into a personalized daily AI podcast.

Freemium· Free tier (manual generation); Pro $4.99/mopersonal podcastnews briefing

DaVinci Resolve

Video · DaVinci Neural Engine (proprietary)

Hollywood-grade post-production suite with a Neural Engine that quietly automates the tedious parts of editing, color, and audio.

Freemium· Free; Studio one-time $295 (unlocks AI Neural Engine)video-editingcolor-grading

Deepgram

Audio · Nova, Flux, Speak (proprietary)

Production-grade speech-to-text, text-to-speech, and voice-agent APIs for real-time and batch audio.

Freemium· Free credits on signup; usage-based pricing; enterprise contracts availablespeech-to-texttext-to-speech

Dia

Audio · Dia-1.6B

Open-weights 1.6B text-to-dialogue model that generates ultra-realistic multi-speaker conversations in one pass.

Free· Free, open weights (Apache 2.0); hosted larger version waitlisteddialogue-generationvoice-cloning

EKHOS AI

Audio · Proprietary local models

Offline Windows transcription app with speaker diarization, GPU acceleration, and 98-language support.

Freemium· Free tier; Premium $9/motranscriptionspeaker-diarization

Edge Impulse

Fine-tuning · Multi-model (TF Lite Micro, custom DSP blocks)

End-to-end platform for training and deploying ML models on microcontrollers, sensors, and other edge hardware.

Freemium· Free developer tier; paid Professional and Enterprise plans (contact sales)edge-aitinyml

Fal.ai

Image Generation · Multi-model (Flux, Stable Diffusion, video/audio models)

Serverless GPU inference platform optimized for fast diffusion and generative media APIs.

Paid· Usage-based; serverless from ~$1.89/GPU-hour, per-output pricing on model APIstext-to-imagetext-to-video

Fireflies.ai

Audio · Multi-model (proprietary ASR + LLM layer)

AI meeting assistant that joins calls, transcribes them, and turns the talk into searchable notes and action items.

Freemium· Free tier; paid plans roughly $10-$39/user/mo, Enterprise on requestmeeting transcriptioncall summaries

Gemini

Writing · Gemini 2.x (Flash / Pro / Ultra)

Google's flagship multimodal AI assistant with deep integration into Workspace and Android.

Freemium· Free tier; Google One AI Premium $19.99/mo includes Gemini Advancedchat-assistantresearch

Geniusrise

Agents · Multi-model

Open-source framework for building, deploying, and scaling AI microservices across text, vision, and audio.

Free· Free, open source; self-hostedinference-servingfine-tuning

Google AI Studio

Coding · Gemini 2.5 Pro / Flash, Imagen, Veo

Browser-based playground and API console for prototyping with Google's Gemini models.

Freemium· Free tier with rate limits; paid via Gemini API usage-based pricingprompt-prototypinggemini-api-keys

Google Veo

Video · Veo 3.1

Google DeepMind's flagship text-to-video model with native audio generation and cinematic camera control.

Paid· Metered via Gemini API; also bundled in Google AI and Workspace planstext-to-videoimage-to-video

Harmonai

Audio · Dance Diffusion / Stable Audio family

Open-source generative audio lab from Stability AI building diffusion models for music production.

Free· Free open-source models and code; no hosted product on this sitemusic-generationsound-design

Hedra

Video · Character-3 (proprietary) + multi-model canvas

AI creative agent for character-driven video, image, and audio generation built around the Character-3 model.

Freemium· Free tier; Basic $15/mo, Creator $30/mo, Professional/Teams $75/mo, Enterprise customtalking-avatar-videocharacter-animation

Higgsfield

Video · Multi-model (Sora 2, Veo 3.1, Kling 3.0, Seedance 2.0, Nano Banana Pro)

AI video and image generation suite that aggregates 30+ frontier models under one workflow.

Paid· Subscription tiers (pricing not clearly published on landing)text-to-videoimage-generation

Interview Solver

Coding

Invisible desktop AI copilot that feeds you LeetCode answers during live coding interviews.

Freemium· Free 10-message trial; $39/mo unlimitedcoding-interviewsleetcode-practice

Kaiber

Video · Proprietary (undisclosed)

AI video generator built around music-reactive animation and image-to-video transforms.

Freemium· Free trial credits; paid plans typically start around $5-15/momusic-videotext-to-video

Kling AI

Video · Kling 3.0 (Omni One)

Kuaishou's flagship AI video generator, currently topping the ELO leaderboard for text-to-video and image-to-video.

Freemium· Free 66 daily credits; Standard $6.99/mo to Ultra $180/mo; API per-secondtext-to-videoimage-to-video

LLM by Datasette

Coding · Multi-model

A CLI and Python library for running prompts against any LLM provider and logging everything to SQLite.

Free· Free and open source (Apache 2.0); pay underlying model providers separatelycli-promptingprompt-logging

LOVO AI

Audio · Proprietary (LOVO Pro V2 voices)

Text-to-speech and voice cloning platform with 500+ voices, an integrated video editor, and a developer API.

Freemium· 14-day free Pro trial, no credit card; paid subscription tierstext-to-speechvoice-cloning

LTX Studio

Video · LTX-2, LTXV-13B, Veo 3, Flux.2

Storyboard-first AI video platform from Lightricks with shot-level camera and character control.

Freemium· Free tier plus paid subscription planstext-to-videostoryboarding

LangFast

Evaluation · Multi-model

No-signup LLM playground for testing, comparing, and versioning prompts against your own API keys.

Paid· One-time lifetime ~$60-$120; 14-day money-backprompt-testingprompt-versioning

Limitless

Agents · Proprietary (multi-model)

AI wearable pendant that records and transcribes your conversations into a searchable personal memory.

Free· Free for existing customers post-Meta acquisition; no longer sold to new buyersmeeting-transcriptionpersonal-memory

LocalAI

Writing · Multi-model (llama.cpp, diffusers, whisper, etc.)

Self-hosted OpenAI-compatible API for running LLMs, image, and audio models on your own hardware.

Free· Free and open source (MIT)local-llm-inferenceopenai-api-replacement

Ludwig

Fine-tuning · Multi-model (PyTorch + HuggingFace Transformers)

Declarative, YAML-driven deep learning framework for fine-tuning LLMs and multi-modal models without writing training loops.

Free· Free, Apache 2.0 open sourcellm-fine-tuningmulti-modal-training

Luthor

Agents

AI compliance reviewer that scans marketing content against FTC, FINRA, SEC and brand rules before it ships.

Enterprise· Contact sales; no public pricingmarketing compliance reviewFINRA/SEC ad review

MaxKB

RAG · Multi-model

Open-source enterprise RAG and agent platform with built-in workflow engine and multi-LLM support.

Freemium· Community edition free (GPLv3); paid enterprise editionenterprise-knowledge-basecustomer-support-bots

MockingBird

Audio · GE2E + Tacotron + HiFi-GAN/WaveRNN/Fre-GAN

Open-source Mandarin-first voice cloning that mimics a speaker from a 5-second sample.

Free· Free, open source (MIT)voice-cloningtext-to-speech

Mubert

Audio · Proprietary sample-based generative engine

AI music generator that spits out royalty-free background tracks for video, podcast, and app use.

Freemium· Free tier; paid plans for commercial use; API via sales demobackground-musicroyalty-free-soundtracks

NotebookLM

RAG · Gemini 2.5

Google's source-grounded research notebook that turns your documents into chats, briefs, and AI-hosted podcasts.

Freemium· Free tier; Plus via Google One AI Premium ($19.99/mo) or Workspace add-ondocument Q&Aresearch synthesis

Nudge AI

Writing

AI scribe and clinical documentation platform that generates audit-ready notes within 30 seconds of a session.

Freemium· Free (5 notes); Pro $99/mo; Enterprise customclinical-notesai-scribe