AI tools tagged Supports Multimodal
33 tools matching this tag.model
GPT-4o
FeaturedOpenAI's multimodal flagship behind ChatGPT.
AI-Flow
Visual node-based builder for chaining OpenAI, Anthropic, Replicate and other model APIs into content pipelines.
Agentset
Production-ready RAG infrastructure with agentic search, citations, and model-agnostic plumbing.
AstrBot
Open-source agentic AI assistant that bridges chat platforms like Telegram, Discord, and QQ with any LLM and a 1000+ plugin ecosystem.
DagsHub
GitHub-style collaboration platform for ML datasets, experiments, and models with MLflow and DVC under the hood.
Gemini
Google's flagship multimodal AI assistant with deep integration into Workspace and Android.
Google AI Studio
Browser-based playground and API console for prototyping with Google's Gemini models.
Google Flow
Google's AI filmmaking studio built around Veo 3.1, Gemini, and natural-language scene editing.
Haystack
Open-source Python framework from deepset for building production RAG pipelines and LLM agents.
Jan
Open-source desktop ChatGPT alternative that runs local LLMs and routes to cloud providers from one app.
Jina Serve
Open-source Python framework for serving multimodal AI models as scalable gRPC/HTTP microservices.
Kimi
Moonshot AI's chat assistant with long-context document analysis, coding agents, and deep research built in.
LLM Stats
Live leaderboard and side-by-side comparison hub for 300+ frontier LLMs across reasoning, coding, and multimodal benchmarks.
LanceDB
Open-source multimodal lakehouse and vector database built for AI training and retrieval at petabyte scale.
LangFast
No-signup LLM playground for testing, comparing, and versioning prompts against your own API keys.
Llama
Meta's open-weight LLM family covering 1B mobile models up to 405B frontier and natively multimodal 10M-context Llama 4 variants.
MMagic
OpenMMLab's research-grade toolbox for image and video generation, restoration, and editing.
MaxKB
Open-source enterprise RAG and agent platform with built-in workflow engine and multi-LLM support.
Maxim AI
End-to-end evaluation, simulation, and observability platform for shipping production-grade AI agents.
Microsoft Copilot
Microsoft's consumer AI assistant, formerly Bing Chat, now powered by GPT-4-class models with web grounding and image generation.
MiniMax
Chinese frontier-model lab shipping multimodal foundation models with a 1M-context coding/agent stack.
Mistral AI
European frontier-model lab with a deep bench of open-weight and premier models for text, code, voice, and OCR.
OlympicArena
Olympiad-level multi-discipline benchmark for stress-testing reasoning in LLMs and multimodal models.
PageAgent
An in-page JavaScript GUI agent that drives web interfaces with natural language, no headless browser required.
Pathway
Live data framework for production RAG and streaming ETL pipelines in Python.
Prompt Foundry
Prompt management and side-by-side LLM evaluation for OpenAI and Anthropic models.
Qwen
Alibaba's open-weight foundation model family covering chat, vision, image generation, translation, and safety classification.
Qwen Chat
Alibaba's flagship chatbot fronting the Qwen family of open-weight LLMs, with vision, code, and image generation in one UI.
SGLang
Open-source high-throughput inference engine for LLMs and multimodal models with OpenAI-compatible serving.
Seedance 2.0
ByteDance's multimodal video model with joint audio-video generation and director-level camera control.
Sesame
Conversational voice AI aiming to cross the uncanny valley with context-aware, emotionally aware speech.
Vidu
Multimodal AI video generator with strong reference-image consistency for characters and props.
VisualWebArena
Open benchmark for evaluating multimodal web agents on realistic visual browsing tasks.