AI tools tagged Supports Audio
48 tools matching this tag.model
ElevenLabs
FeaturedThe gold standard for AI voice cloning and TTS.
GPT-4o
FeaturedOpenAI's multimodal flagship behind ChatGPT.
AssemblyAI
Speech-to-text API with diarisation, summarisation, and topic detection.
Whisper
OpenAI's open-source speech-to-text — the de-facto baseline.
Replicate
One-API platform for running and fine-tuning open-source models.
HeyGen
Avatar video + lip-sync translation at scale.
Murf
TTS aimed at corporate voiceover and e-learning.
AI/ML API
Unified API gateway exposing 500+ AI models behind one OpenAI-compatible endpoint.
AISaver
Browser-based face swap and AI video/image generator stitched together from third-party models.
AIVA
AI music composition tool that generates royalty-friendly tracks in 250+ styles with editable MIDI output.
Aispect
Turns live audio at events into AI-generated visuals on the fly, in 30+ languages.
AnkiDecks AI
AI flashcard generator that turns PDFs, slides, YouTube videos and handwritten notes into Anki-ready decks.
Argil
AI avatar video generator that clones your face and voice from a single photo and a minute of audio.
AudioCraft
Meta's open-source research toolkit for generating music and sound effects from text via a single autoregressive language model.
Azure AI Speech (Neural TTS)
Microsoft's enterprise-grade neural text-to-speech with 100+ languages, custom brand voices, and SSML control.
BrainSoup
Windows desktop app for building teams of specialized AI agents that collaborate through chat.
Context Data
Enterprise data platform for deploying private RAG pipelines without infrastructure plumbing.
CustomPod
Turns your chosen news sources, RSS feeds, and inboxes into a personalized daily AI podcast.
DaVinci Resolve
Hollywood-grade post-production suite with a Neural Engine that quietly automates the tedious parts of editing, color, and audio.
Deepgram
Production-grade speech-to-text, text-to-speech, and voice-agent APIs for real-time and batch audio.
Dia
Open-weights 1.6B text-to-dialogue model that generates ultra-realistic multi-speaker conversations in one pass.
EKHOS AI
Offline Windows transcription app with speaker diarization, GPU acceleration, and 98-language support.
Edge Impulse
End-to-end platform for training and deploying ML models on microcontrollers, sensors, and other edge hardware.
Fal.ai
Serverless GPU inference platform optimized for fast diffusion and generative media APIs.
Fireflies.ai
AI meeting assistant that joins calls, transcribes them, and turns the talk into searchable notes and action items.
Gemini
Google's flagship multimodal AI assistant with deep integration into Workspace and Android.
Geniusrise
Open-source framework for building, deploying, and scaling AI microservices across text, vision, and audio.
Google AI Studio
Browser-based playground and API console for prototyping with Google's Gemini models.
Google Veo
Google DeepMind's flagship text-to-video model with native audio generation and cinematic camera control.
Harmonai
Open-source generative audio lab from Stability AI building diffusion models for music production.
Hedra
AI creative agent for character-driven video, image, and audio generation built around the Character-3 model.
Higgsfield
AI video and image generation suite that aggregates 30+ frontier models under one workflow.
Interview Solver
Invisible desktop AI copilot that feeds you LeetCode answers during live coding interviews.
Kaiber
AI video generator built around music-reactive animation and image-to-video transforms.
Kling AI
Kuaishou's flagship AI video generator, currently topping the ELO leaderboard for text-to-video and image-to-video.
LLM by Datasette
A CLI and Python library for running prompts against any LLM provider and logging everything to SQLite.
LOVO AI
Text-to-speech and voice cloning platform with 500+ voices, an integrated video editor, and a developer API.
LTX Studio
Storyboard-first AI video platform from Lightricks with shot-level camera and character control.
LangFast
No-signup LLM playground for testing, comparing, and versioning prompts against your own API keys.
Limitless
AI wearable pendant that records and transcribes your conversations into a searchable personal memory.
LocalAI
Self-hosted OpenAI-compatible API for running LLMs, image, and audio models on your own hardware.
Ludwig
Declarative, YAML-driven deep learning framework for fine-tuning LLMs and multi-modal models without writing training loops.
Luthor
AI compliance reviewer that scans marketing content against FTC, FINRA, SEC and brand rules before it ships.
MaxKB
Open-source enterprise RAG and agent platform with built-in workflow engine and multi-LLM support.
MockingBird
Open-source Mandarin-first voice cloning that mimics a speaker from a 5-second sample.
Mubert
AI music generator that spits out royalty-free background tracks for video, podcast, and app use.
NotebookLM
Google's source-grounded research notebook that turns your documents into chats, briefs, and AI-hosted podcasts.
Nudge AI
AI scribe and clinical documentation platform that generates audit-ready notes within 30 seconds of a session.