TruLens

Open-source evaluation and tracing framework for LLM apps and agents, built on OpenTelemetry.

Free· Free, open source (Apache-licensed Python package)EvaluationMulti-model (LLM-as-judge)

Best for

Pick TruLens if you want a code-first, open-source way to trace and score LLM apps or agents without sending eval data to a hosted vendor.

Skip if

Skip it if you need a turnkey managed eval SaaS with a hosted UI, non-Python SDKs, or zero infra work.

TruLens is a Python-based evaluation and observability toolkit for LLM applications, RAG pipelines, and agents. It emits OpenTelemetry traces of your app's execution and layers on a benchmarked library of feedback functions, groundedness, context relevance, answer coherence, and custom metrics, so you can score runs, compare app versions in a dashboard, and catch regressions before shipping.

Originally built by TruEra and now maintained by Snowflake following their acquisition, TruLens is aimed at engineering teams who want a code-first, self-hostable way to move 'from vibes to metrics.' It's free and open source (pip install trulens), which makes it a natural pick for teams already invested in the Snowflake or open observability ecosystem, or anyone who wants to keep eval data in-house rather than sending traces to a hosted vendor.

Because it's SDK-driven and framework-agnostic, TruLens slots in alongside LangChain, LlamaIndex, or raw OpenAI/Anthropic calls, and its OTel foundation means traces can also flow into whatever backend you already use. The tradeoff is that you're operating a library, not a polished SaaS: dashboards, storage, and LLM-as-judge costs are your problem.

Editor's take

TruLens is one of the more credible open-source picks in the LLM eval space, especially now that Snowflake is behind it. The OpenTelemetry foundation and benchmarked metrics library are genuinely useful, but treat it as a framework you operate, not a product that operates itself.

— The AI Tool Bible editorial team