📖 The AI Tool Bible

Sesame

Conversational voice AI aiming to cross the uncanny valley with context-aware, emotionally aware speech.

Free· Free research preview; consumer product pricing not announcedAudioSesame CSM (1B / 3B / 8B)
Visit website →
Best for

Pick Sesame if you want state-of-the-art open-weight conversational speech and are willing to self-host CSM for voice agents or research.

Skip if

Skip it if you need a turnkey hosted TTS API with SLAs, multilingual coverage, or enterprise support today.

Sesame is an AI voice research company building conversational agents that sound like real people rather than the flat, robotic assistants that most TTS systems still produce. Its centerpiece is the Conversational Speech Model (CSM), a multimodal transformer that jointly processes text and audio tokens through a semantic backbone and an acoustic decoder operating on RVQ codes. The team has released three sizes (1B, 3B, and 8B backbone parameters) and made key components available under Apache 2.0 on GitHub, alongside a research preview at app.sesame.com and a mobile signup for a broader consumer product.

The pitch is 'voice presence' - agents you can think out loud with, that pick up context and respond with human-like prosody. Sesame is aiming this at everyday users rather than call-center automation, and it has a longer-term hardware bet in the form of AI eyewear slated for 2027. Pricing isn't published; the research preview is free and the mobile app is invite-first.

For developers, the interesting part is the open weights and the paper on how CSM is trained (compute amortization on 1/16th of frames, homograph and pronunciation-consistency benchmarks). There is no public commercial API yet - if you want to build on Sesame today you're working from the open-source release, not a hosted endpoint.

Editor's take

Sesame's demos are the first voice AI in a while that made us do a double take - the prosody and back-channel timing feel genuinely alive. It's early, and there's no billable API yet, but the open-source CSM release makes it one of the more credible bets in the voice-agent space.

— The AI Tool Bible editorial team

Pros

  • Open-source weights under Apache 2.0 for the CSM speech model
  • Distinctly natural, context-aware prosody compared to typical TTS
  • Backed by serious original research with published benchmarks
  • Free research preview available at app.sesame.com

Cons

  • ⚠️ No public commercial API - you self-host the open weights
  • ⚠️ Pricing and productisation still vague; consumer app is invite-only
  • ⚠️ Hardware (AI glasses) not shipping until 2027
  • ⚠️ Small model catalogue focused on English voice quality

Use cases

conversational-voicetext-to-speechvoice-agentsambient-ai

Explore related

Compare with similar tools

All in Audio