Best AI tools for observability
21 tools in the Evaluation category, filtered to observability.
Braintrust
FeaturedEval, monitor, and improve AI products end-to-end.
LangSmith
LangChain's eval + observability platform.
Helicone
Open-source LLM observability — one-line proxy install.
PromptLayer
Lightweight prompt logging + management for OpenAI/Claude apps.
Agenta
Open-source LLMOps platform for prompt engineering, evaluation, and observability in one workspace.
Arize AI
Enterprise observability and evaluation platform for LLM agents and generative AI applications.
Arthur
Open-source toolkit for testing, tracing, and monitoring production AI agents.
Athina AI
Collaborative LLM evaluation and observability platform for teams shipping AI features to production.
Fiddler AI
Enterprise AI observability and guardrails platform for monitoring agents, LLMs, and ML models in production.
HoneyHive
OpenTelemetry-native observability and evaluation platform for LLM agents in production.
LangFast
No-signup LLM playground for testing, comparing, and versioning prompts against your own API keys.
Langfuse
Open-source LLM observability, prompt management, and evaluation in one platform.
MLflow
Open-source platform for tracking, evaluating, and deploying ML models and LLM applications.
Maxim AI
End-to-end evaluation, simulation, and observability platform for shipping production-grade AI agents.
Opik
Open-source LLM observability and evaluation platform for debugging and monitoring AI agents in production.
Parea AI
LLM evaluation, observability, and prompt management platform for teams shipping production AI apps.
Phoenix
Open-source LLM and agent observability platform with tracing, evals, and experimentation built on OpenTelemetry.
Respan (formerly Keywords AI)
LLM engineering platform combining a multi-model gateway with tracing, evals, and prompt management.
Superwise
Agentic management platform for runtime guardrails, policy enforcement, and observability across LLM agents.
TruLens
Open-source evaluation and tracing framework for LLM apps and agents, built on OpenTelemetry.
W&B Weave
Production observability, tracing, and evaluation for LLM and agent systems from the Weights & Biases stack.