llmfit

Terminal tool that scores hundreds of open LLMs against your actual CPU, RAM, and GPU and tells you which ones will run well.

Free· Free, MIT-licensedEvaluationMulti-model

Visit website →

Best for

Pick llmfit if you self-host LLMs and want a data-driven answer to which GGUF, MLX, or Ollama model your specific machine should actually run.

Skip if

Skip it if you only use hosted APIs like OpenAI or Anthropic and never plan to run inference locally.

llmfit is a Rust-based terminal app that profiles your machine (CPU cores, system RAM, GPU model, VRAM, available backend) and then ranks hundreds of open-weight LLMs by how they will actually perform on that hardware. It scores each model across quality, speed, fit, and context dimensions, picks the best quantization, estimates tokens-per-second, and flags whether the model is a GPU fit, a CPU offload, an MoE expert-switch case, or a no-go. The default interface is a Vim-keyed TUI with filters by provider, use case, capability, license, and runtime; a classic CLI mode emits JSON for piping into jq.

Where most "what model should I run" guides are blog posts that age out in a month, llmfit is data-driven and live: it integrates with Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio to mark which models you already have installed, has a Plan mode that inverts the question to "what hardware would I need for this config?", a Hardware Simulation mode to test specs you don't own, and a Community Leaderboard view powered by localmaxxing.com that surfaces real measured tok/s, TTFT, and VRAM from other users on the same GPU. It's MIT-licensed, free, and installable via Homebrew, Scoop, MacPorts, uv/pip, Docker, or cargo.

The project is actively developed by Alex Jones with a tunable scoring panel (efficiency, GPU/CPU/MoE offload factors) so power users can correct for known overestimation cases, plus a download manager that handles model fetching and deletion against the configured runtime. It pairs with sister tools llmserve and llama-panel for actually serving the chosen model.

Editor's take

This is the tool the "what can my 3060 run?" Reddit threads should have replaced years ago. It is opinionated, hardware-aware, and refreshingly free of vendor spin, and the localmaxxing.com integration turns guesswork into measured reality. Easy recommendation for anyone serious about local inference.

— The AI Tool Bible editorial team