Mixture of Experts (MoE)(Mixture of Experts)
Mixture of Experts (MoE) is an LLM architecture that activates only a subset of “expert” sub-networks per token at inference time, rather than the entire model. Total parameter counts can be massive, but the per-token compute (FLOPs) is determined only by the activated parameters.
Notable 2026 examples include DeepSeek V4 Pro (1.6T total / 49B activated), DeepSeek V4 Flash (284B / 13B), Gemma 4 26B MoE, and Qwen 3.5 397B (A17B activated). MoE has become the dominant approach in frontier model development for balancing capability against compute cost.