📖 The AI Tool Bible

Weco AI

Autoresearch engine that iteratively rewrites code to optimize against a numeric evaluation metric.

Freemium· Open-source CLI; hosted/commercial pricing not publishedEvaluationMulti-model (LLM + AIDE tree search)
Visit website →
Best for

Pick Weco AI if you have a measurable objective (kernel speed, model accuracy, prompt score) and want an agent to iterate against it autonomously.

Skip if

Skip it if your problem has no single numeric metric, or if you just need a one-time refactor a normal coding copilot can do in one pass.

Weco AI is a code-optimization platform built around the AIDE algorithm, which pairs LLM-driven code proposals with tree search to iteratively improve a codebase against a user-supplied evaluation script. You hand it source code plus an eval that emits a number (latency, accuracy, memory, cost, throughput, quality score), and it loops: propose change, run eval, read metric, branch on what improved. The team describes the product as 'recursively self-improving AI' and ships a CLI (weco-cli) backed by docs at docs.weco.ai.

It's aimed at ML and systems engineers who have problems where the optimum isn't obvious and brute-force experimentation pays off: GPU kernel tuning (CUDA, Triton), model architecture tweaks, prompt engineering with measurable scoring, and general perf work. The same group is behind AIDE (the agent that posted human-level results on Kaggle-style data science competitions) and the Aiden agent that placed top in OpenAI's hiring challenge, so the research pedigree is real. Pricing isn't published on the marketing site; the CLI is open and the hosted autoresearch service appears to be the commercial layer.

It's language-agnostic and hardware-agnostic because the only contract is 'your eval prints a number.' That makes it powerful for the niche it serves and useless for tasks where success can't be expressed numerically or where a one-shot edit would do.

Editor's take

Weco is one of the more intellectually honest 'agent' products out there - it refuses to pretend it can optimize what you can't measure. For ML and systems engineers with a real eval harness, the AIDE-driven loop is a credible alternative to hand-tuning. Outside that niche it's not the tool you want.

— The AI Tool Bible editorial team

Pros

  • Metric-driven optimization loop is principled, not vibes-based
  • Language and hardware agnostic - only needs a numeric eval
  • Strong research pedigree (AIDE, Aiden, SpecBench)
  • Open CLI (weco-cli) lowers integration friction
  • Genuinely useful for GPU kernel and ML perf work

Cons

  • ⚠️ Only works when success can be expressed as a single number
  • ⚠️ Pricing for hosted product not publicly disclosed
  • ⚠️ Overkill for one-shot code edits or qualitative tasks
  • ⚠️ Smaller community than mainstream AI eval tools

Use cases

code-optimizationgpu-kernel-tuningml-experimentationprompt-engineeringautoresearch

Explore related

Compare with similar tools

All in Evaluation