AudioCraft
Meta's open-source research toolkit for generating music and sound effects from text via a single autoregressive language model.
Pick AudioCraft if you're a researcher or engineer who wants to self-host or fine-tune state-of-the-art text-to-audio models without paying per call.
Skip it if you want a polished web app for making finished songs - use Suno, Udio, or ElevenLabs Music instead.
AudioCraft is Meta AI's open-source framework for generative audio, bundling three models under one roof: MusicGen for text-to-music, AudioGen for environmental sound effects from text prompts, and EnCodec, a neural audio codec that compresses audio into discrete tokens. The architecture uses a single autoregressive language model operating over streams of compressed audio tokens, which simplifies training and inference compared to diffusion-based audio pipelines.
This is squarely a research release, not a polished consumer product. There's no hosted web app, no pricing tier, and no managed API on this demo page; you clone the GitHub repo (facebookresearch/audiocraft), bring your own GPU, and run it via PyTorch. That makes it a fit for ML researchers, audio engineers exploring generative pipelines, and developers who want to fine-tune or self-host text-to-music without paying per generation to a SaaS like Suno or Udio.
The models are released under permissive licenses (MIT for code, CC-BY-NC for the model weights in most cases, which matters for commercial use), and the codebase is actively referenced across the open-source audio ecosystem. If you want a click-and-go music generator, look elsewhere; if you want the underlying tech to build on, AudioCraft is one of the most cited starting points.
AudioCraft is the canonical open-source reference implementation for text-to-audio generation, and that's its main value. As a product it's barely one - this is a demo page for a research drop - but as a foundation for anyone building serious audio AI infrastructure, it's still hard to beat.
— The AI Tool Bible editorial team
Pros
- ✅ Fully open source with code and weights published by Meta
- ✅ Single-LM architecture is simpler than diffusion pipelines
- ✅ Covers music, sound effects, and neural codec in one repo
- ✅ Strong baseline used widely in audio ML research
- ✅ No usage fees once self-hosted
Cons
- ⚠️ No hosted product or managed API - you must run it yourself
- ⚠️ Model weights typically CC-BY-NC, limiting commercial use
- ⚠️ Requires GPU and ML tooling to operate
- ⚠️ Output quality trails newer commercial models like Suno v4
Use cases
Explore related
Compare with similar tools
All in Audio →ElevenLabs
FeaturedThe gold standard for AI voice cloning and TTS.
Suno
FeaturedText-to-song AI — full vocal tracks from a prompt.
Udio
Suno's main rival for AI-generated full songs.
AssemblyAI
Speech-to-text API with diarisation, summarisation, and topic detection.
Whisper
OpenAI's open-source speech-to-text — the de-facto baseline.
Resemble.ai
Enterprise voice cloning with deepfake-detection layer.