OneKE
Open-source multi-agent framework for schema-guided knowledge extraction from documents.
Pick OneKE if you're building a domain knowledge graph and want a flexible, open-source extraction pipeline that runs against either API or local LLMs.
Skip it if you need a managed SaaS extraction API, English-first docs, or a turnkey solution without DevOps work.
OneKE is a dockerized, MIT-licensed knowledge extraction system from the ZJU-NLP lab and Ant Group's OpenSPG project. It uses a multi-agent LLM pipeline (schema agent, extraction agent, reflection agent) to pull structured facts out of plain text, HTML, PDF, Word, JSON, and TXT files, covering NER, relation extraction, event extraction, triple extraction, and open-ended information extraction. Output can be assembled directly into a visualizable knowledge graph.
The project's differentiator is flexibility on both ends: you can plug in OpenAI or DeepSeek-R1 via API, or run it fully locally against LLaMA3, Qwen2.5, ChatGLM4, MiniCPM3, or the bundled OneKE-13B model with optional vLLM acceleration. Schemas can be default, predefined, or self-deduced by the agent, and case-retrieval plus reflection loops let you trade speed for accuracy. It's aimed at researchers and engineers building domain-specific KGs who don't want to wire up extraction infrastructure from scratch.
Deployment is via Docker or Conda, with a Streamlit web UI for interactive runs and a HuggingFace Spaces demo. As an open-source academic-led project, the polish lags commercial extraction APIs, and the Yuque-hosted user guide is mostly Chinese, but the breadth of supported tasks, models, and file types is rare at this license tier.
OneKE is one of the more serious open-source attempts at productionizing LLM-based information extraction, and the multi-agent schema/reflection design is genuinely useful. The catch is that it's an academic-flavored release; expect to read Chinese docs and do real integration work. Worth it if you'd otherwise glue together LangChain agents yourself.
— The AI Tool Bible editorial team
Pros
- ✅ Covers NER, RE, EE, and triple extraction in one framework
- ✅ Works with API models or fully local LLMs via vLLM
- ✅ Ingests PDF, Word, HTML, JSON, and plain text out of the box
- ✅ Multi-agent schema + reflection loop improves extraction quality
- ✅ MIT license with Docker and Streamlit UI included
Cons
- ⚠️ Documentation is primarily Chinese and scattered across Yuque/GitHub
- ⚠️ Self-hosting and tuning agents is non-trivial for non-researchers
- ⚠️ No managed cloud offering; you bring the infrastructure
- ⚠️ Quality depends heavily on the underlying LLM you wire in
Use cases
Explore related
Compare with similar tools
All in RAG →Pinecone
FeaturedManaged vector database for production-scale similarity search.
LlamaIndex
FeaturedData framework for connecting LLMs to your data.
Weaviate
Open-source vector DB with hybrid search and modules.
LangChain
The broad LLM application framework — chains, agents, retrievers.
Vespa
Yahoo's open-source search engine with vector + sparse retrieval.
Chroma
Embedded, developer-friendly vector store for Python.