Langfuse

Open-source LLM observability, prompt management, and evaluation in one platform.

Freemium· Free self-host & Hobby tier; Core $29/mo, Pro $199/mo, Enterprise $2,499/moEvaluationModel-agnostic

Best for

Pick Langfuse if you want production-grade LLM tracing, prompt versioning, and evals in one open-source tool you can self-host.

Skip if

Skip it if you just need a no-code prompt playground or you're not yet running LLM calls in production worth instrumenting.

Langfuse is an open-source AI engineering platform for teams building and operating LLM applications. It bundles three things that are usually sold separately: tracing/observability (multi-turn sessions, agent graphs, token and cost tracking), prompt management with versioning and a playground, and evaluation (LLM-as-a-judge, code evaluators, human annotation, user feedback). It's built on OpenTelemetry, ships native Python and JavaScript SDKs, and has first-class integrations with the OpenAI SDK, LangChain, LlamaIndex, and 100+ other libraries.

What sets Langfuse apart is the licensing and the price floor. The whole stack is self-hostable for free via Docker Compose or Kubernetes, which makes it the default choice for teams that don't want to pipe prompts and user data through a vendor. The Cloud version starts at a Hobby tier (50k units/month, 2 users, 30-day retention) and climbs to Core at $29/mo, Pro at $199/mo, and Enterprise at $2,499/mo with SOC2/ISO27001/HIPAA, SCIM, and audit logs. Pricing is metered by 'units' (roughly observations/scores) with graduated overage rates from $8 down to $6 per 100k.

It's aimed at engineering teams running real LLM workloads in production, especially anyone wiring up agents, RAG, or multi-step chains that benefit from trace-level debugging. Competitors include LangSmith, Helicone, Arize Phoenix, and Weights & Biases Weave; Langfuse's edge is the open-source license, OTel grounding, and the fact that prompts and evals live alongside traces rather than in three different tools.

Editor's take

Langfuse has quietly become the default OSS answer to LangSmith. The OpenTelemetry foundation and unified traces+prompts+evals model are the right architectural calls, and the self-host path means you don't have to negotiate a contract to see your own data. If you're past the prototype stage, instrument it.

— The AI Tool Bible editorial team