Langfuse
Open-source LLM observability, prompt management, and evaluation in one platform.
Pick Langfuse if you want production-grade LLM tracing, prompt versioning, and evals in one open-source tool you can self-host.
Skip it if you just need a no-code prompt playground or you're not yet running LLM calls in production worth instrumenting.
Langfuse is an open-source AI engineering platform for teams building and operating LLM applications. It bundles three things that are usually sold separately: tracing/observability (multi-turn sessions, agent graphs, token and cost tracking), prompt management with versioning and a playground, and evaluation (LLM-as-a-judge, code evaluators, human annotation, user feedback). It's built on OpenTelemetry, ships native Python and JavaScript SDKs, and has first-class integrations with the OpenAI SDK, LangChain, LlamaIndex, and 100+ other libraries.
What sets Langfuse apart is the licensing and the price floor. The whole stack is self-hostable for free via Docker Compose or Kubernetes, which makes it the default choice for teams that don't want to pipe prompts and user data through a vendor. The Cloud version starts at a Hobby tier (50k units/month, 2 users, 30-day retention) and climbs to Core at $29/mo, Pro at $199/mo, and Enterprise at $2,499/mo with SOC2/ISO27001/HIPAA, SCIM, and audit logs. Pricing is metered by 'units' (roughly observations/scores) with graduated overage rates from $8 down to $6 per 100k.
It's aimed at engineering teams running real LLM workloads in production, especially anyone wiring up agents, RAG, or multi-step chains that benefit from trace-level debugging. Competitors include LangSmith, Helicone, Arize Phoenix, and Weights & Biases Weave; Langfuse's edge is the open-source license, OTel grounding, and the fact that prompts and evals live alongside traces rather than in three different tools.
Langfuse has quietly become the default OSS answer to LangSmith. The OpenTelemetry foundation and unified traces+prompts+evals model are the right architectural calls, and the self-host path means you don't have to negotiate a contract to see your own data. If you're past the prototype stage, instrument it.
— The AI Tool Bible editorial team
Pros
- ✅ Fully open source and self-hostable at no cost
- ✅ Tracing, prompts, and evals in one platform instead of three
- ✅ Built on OpenTelemetry with SDKs for Python and JS
- ✅ Integrates with OpenAI SDK, LangChain, LlamaIndex, and 100+ libraries
- ✅ Generous free Hobby tier with no credit card required
Cons
- ⚠️ Cloud pricing by 'units' is opaque until you instrument and measure
- ⚠️ Self-hosting Postgres + ClickHouse stack is non-trivial to operate
- ⚠️ Pro/Enterprise jump ($29 to $199 to $2,499) leaves a gap for mid-size teams
Use cases
Explore related
Compare with similar tools
All in Evaluation →Braintrust
FeaturedEval, monitor, and improve AI products end-to-end.
LangSmith
LangChain's eval + observability platform.
Weights & Biases
The ML experiment tracker, now with LLM eval features.
Helicone
Open-source LLM observability — one-line proxy install.
Humanloop
Prompt management + evals for collaborative AI teams.
PromptLayer
Lightweight prompt logging + management for OpenAI/Claude apps.