📖 The AI Tool Bible

Llama

✓ Editorially verified

Meta's open-weight LLM family covering 1B mobile models up to 405B frontier and natively multimodal 10M-context Llama 4 variants.

Freemium· Weights free under Llama Community License; partner API inference ~$0.19-$0.49 per 1M tokensFine-tuningLlama 4 (Maverick, Scout), Llama 3.3/3.2/3.1
Visit website →
Best for

Pick Llama if you need open weights you can fine-tune, quantise, and self-host with a license that survives commercial deployment at scale.

Skip if

Skip it if you want a turnkey hosted chatbot with first-class tool use and you don't care about owning the weights.

Llama is Meta's open-weight large language model family, now spanning Llama 4 (Maverick and Scout, both natively multimodal with context windows up to 10M tokens), Llama 3.3 70B for synthetic data generation, the 3.2 line (1B/3B edge models plus 11B/90B vision models), and Llama 3.1 in 8B, 70B, and 405B sizes. The weights ship under Meta's community license so teams can fine-tune, distill, quantise, and self-host the models anywhere from a single GPU to multi-node clusters.

The practical pitch is cost and control. You can run Llama locally via Ollama, llama.cpp, vLLM, or any HF-compatible runtime, or hit it as a managed API on Bedrock, Azure AI, Groq, Together, Fireworks, or Meta's own Llama API (currently waitlisted at llama.developer.meta.com). Distributed inference pricing on partner clouds lands in the $0.19-$0.49 per million tokens range for Llama 4, which undercuts every frontier closed model. It's the default choice for shops that need open weights, on-prem deployment, or a permissive enough license for commercial fine-tuning.

Caveats: the license isn't OSI-approved (there's a 700M MAU clause and acceptable-use policy), Meta doesn't operate a polished first-party chat product the way OpenAI or Anthropic do, and tool-use/agentic behaviour still lags GPT-4o and Claude on the hardest benchmarks. But for raw weights you can own, nothing else at this scale comes close.

Editor's take

Llama is the gravitational centre of the open-weight ecosystem - every serious local-LLM tool, every fine-tuning recipe, every cheap inference provider points back here. Llama 4's 10M context and multimodality finally close most of the gap with closed frontier models, and if you're building anything where data sovereignty or per-token cost matters, this is the default.

— The AI Tool Bible editorial team

Pros

  • Open weights from 1B edge models to 405B frontier with permissive commercial license
  • Natively multimodal Llama 4 with up to 10M-token context
  • Runs anywhere: Ollama, vLLM, llama.cpp, Bedrock, Groq, Together
  • Aggressive inference pricing on partner clouds (~$0.19-$0.49/M tokens)
  • Huge fine-tuning ecosystem and community tooling

Cons

  • ⚠️ License is source-available, not OSI-approved (700M MAU clause)
  • ⚠️ Tool-use and agentic reasoning still trail GPT-4o and Claude on hardest tasks
  • ⚠️ No polished first-party chat product or hosted playground
  • ⚠️ Largest models require serious GPU budget to self-host

Use cases

self-hosted-llmfine-tuningmultimodal-chatsynthetic-dataedge-inferencerag-backbone

Explore related

Compare with similar tools

All in Fine-tuning