📖 The AI Tool Bible

Best AI tools for monitoring

10 tools in the Evaluation category, filtered to monitoring.

All Evaluation

Braintrust

Featured
Evaluation · Platform (any LLM)
8.9

Eval, monitor, and improve AI products end-to-end.

Freemium· Free up to 1k events/day; team from $249/moevalsmonitoring

Arize AI

Evaluation · Multi-model

Enterprise observability and evaluation platform for LLM agents and generative AI applications.

Freemium· Free tier and OSS Phoenix; paid/enterprise tiers via salesllm-observabilityagent-evaluation

Arthur

Evaluation · Multi-model

Open-source toolkit for testing, tracing, and monitoring production AI agents.

Freemium· Open-source (MIT) + free SaaS tier; paid/enterprise plans on requestagent-evaluationprompt-management

Artificial Analysis

Evaluation · Multi-model

Independent benchmarking platform comparing AI models and inference providers across intelligence, speed, and cost.

Freemium· Free public leaderboards; paid plans for expanded data and reports (contact for pricing)model-benchmarkingprovider-comparison

Athina AI

Evaluation · Multi-model

Collaborative LLM evaluation and observability platform for teams shipping AI features to production.

Freemium· Starter free (10k logs/mo); Pro & Enterprise customllm-evaluationprompt-management

Fiddler AI

Evaluation · Fiddler Centor (proprietary evaluators)

Enterprise AI observability and guardrails platform for monitoring agents, LLMs, and ML models in production.

Enterprise· Tiered plans; contact salesllm-observabilityagent-monitoring

Great Expectations

Evaluation

Open-source data quality framework for validating the datasets that feed your ML and analytics pipelines.

Freemium· GX Core free (Apache 2.0); GX Cloud paid tiers, contact salesdata-validationpipeline-testing

Maxim AI

Evaluation · Multi-model

End-to-end evaluation, simulation, and observability platform for shipping production-grade AI agents.

Freemium· Free tier; 14-day trial on paid plans; custom enterprise pricingagent-evaluationllm-observability

Opik

Evaluation · Multi-model

Open-source LLM observability and evaluation platform for debugging and monitoring AI agents in production.

Freemium· Free open-source self-host; free Cloud tier (no card); Enterprise contact salesllm-tracingagent-evaluation

Respan (formerly Keywords AI)

Evaluation · Multi-model (500+ via gateway)

LLM engineering platform combining a multi-model gateway with tracing, evals, and prompt management.

Freemium· Free tier; paid plans (pricing not public); enterprise on requestllm-observabilityprompt-management