CogVideoX

Open-source text-to-video and image-to-video diffusion transformer from Zhipu AI, runnable on consumer GPUs.

Free· Open-source weights; commercial API via bigmodel.cnVideoCogVideoX / CogVideoX1.5 (diffusion transformer)

Best for

Pick CogVideoX if you want to self-host or fine-tune a capable text/image-to-video model on a single consumer or prosumer GPU.

Skip if

Skip it if you need real-time generation, minute-long clips, or a polished managed UI rather than a Python and ComfyUI workflow.

CogVideoX is Zhipu AI's open-source video generation model family, the open-weight sibling of the commercial QingYing video product. The lineup spans CogVideoX-2B, CogVideoX-5B, CogVideoX-5B-I2V (image-to-video), and the newer CogVideoX1.5-5B series, which extends output to 10-second clips at up to 1360x768. The models are diffusion transformers with a 3D causal VAE and support text-to-video, image-to-video, and video continuation, with Diffusers, SAT, and ComfyUI integrations available out of the box.

What sets it apart is hardware reach: with TorchAO INT8 quantization, the 2B variant runs in under 4GB of VRAM and the 5B fits in roughly 5GB, meaning desktop cards like an RTX 3060 (and even free Colab T4s) can generate video. It is aimed at researchers, fine-tuners, and builders who want a permissively licensed (Apache 2.0 for code and the 2B model; custom CogVideoX license for 5B weights) alternative to closed APIs like Runway or Sora. Inference is slow on a single A100 (~90s for 2B, ~180s for 5B, ~1000s for 1.5-5B), so it is more of a workbench than a production renderer.

The ecosystem is unusually rich: cogvideox-factory enables LoRA fine-tuning on a single 4090, xDiT parallelizes inference across GPUs, ComfyUI-CogVideoXWrapper plugs it into existing workflows, and Zhipu's bigmodel.cn API offers a hosted commercial path for those who don't want to self-host.

Editor's take

The most credible open-weight video model line outside of the Wan and HunyuanVideo camps, and the one with the best low-VRAM story. Inference is slow and the 5B license is not fully open, but for researchers and tinkerers it is the obvious starting point. Pair it with cogvideox-factory for LoRAs.

— The AI Tool Bible editorial team

Pros

✅ Genuinely runs on consumer GPUs with INT8 quantization (under 5GB VRAM)
✅ Permissive Apache 2.0 license on code and the 2B model weights
✅ Strong ecosystem: Diffusers, ComfyUI, LoRA fine-tuning, xDiT parallel inference
✅ Supports text-to-video, image-to-video, and video continuation in one family
✅ Backed by Zhipu AI with active releases through 2025 (CogKit, DDIM Inverse)