📖 The AI Tool Bible

mini-SWE-agent

A 100-line open-source coding agent that scores 74%+ on SWE-bench Verified.

Free· Free and open source (Apache 2.0); you pay only for the underlying LLM tokens.AgentsMulti-model (via litellm, OpenRouter, Portkey)
Visit website →
Best for

Pick mini-SWE-agent if you want a tiny, readable, model-agnostic coding-agent baseline to benchmark, extend, or embed in a research pipeline.

Skip if

Skip it if you want a polished IDE assistant or a managed autonomous-engineer SaaS with dashboards and team features.

mini-SWE-agent is a deliberately minimal AI coding agent from the Princeton/Stanford team behind SWE-bench and SWE-agent. The entire core is roughly 100 lines of Python with a linear, append-only message history and a single tool: bash via subprocess. That brutal simplicity is the point — and it still cracks 74%+ on SWE-bench Verified, putting it in the same league as far more elaborate agent frameworks.

It's aimed at researchers, agent tinkerers, and engineers who want a hackable baseline rather than a productized assistant. Model-agnostic via litellm, OpenRouter, and Portkey, so you can plug in Claude, GPT, Gemini, or local models. Sandbox options span local, Docker, Podman, Singularity, and Bubblewrap, which makes it usable for batch inference on benchmark suites without root-level worries. Apache 2.0 and free.

There are Python bindings for embedding it in larger pipelines, but there's no hosted product, no UI, no SaaS billing, and no support contract. If you want a Cursor-style IDE or a managed autonomous engineer, look elsewhere; if you want a readable reference implementation that you can read in one sitting and extend in an afternoon, this is the one.

Editor's take

The most refreshing thing in agent-land in a while — a credible SWE-bench performer that fits on a single screen. We'd reach for this before any of the heavier autonomous-coder frameworks when the goal is to understand, modify, or benchmark agent behavior rather than ship a polished UX.

— The AI Tool Bible editorial team

Pros

  • Roughly 100 lines of Python — trivially readable and hackable
  • Scores 74%+ on SWE-bench Verified despite the minimal design
  • Model-agnostic via litellm, OpenRouter, and Portkey
  • Sandbox-friendly: Docker, Podman, Singularity, Bubblewrap, local
  • Apache 2.0; backed by the SWE-bench and SWE-agent authors

Cons

  • ⚠️ No GUI, IDE integration, or hosted product
  • ⚠️ Bash-only tool surface — no built-in browser, search, or planning tools
  • ⚠️ You build your own ops layer (logging, cost caps, retries)
  • ⚠️ Aimed at researchers more than day-to-day end users

Use cases

swe-bench-evaluationautonomous-codingagent-researchbatch-inferenceagent-baseline

Explore related

Compare with similar tools

All in Agents