Definition: inference engine

The AI application people interact with. For example, ChatGPT and Gemini are inference engines. They rely on an AI language model that has been trained and fine-tuned. For example, ChatGPT uses GPT models. See AI training vs. inference.

Human Rules Were the First AI
In the 1960s, an AI was an "expert system" that relied on rules defined by people. Expert systems were also one application, not divided into model and inference like today. See expert system.

Context and Decode
To increase performance in today's AI datacenters, "disaggregated inference" executes in two stages. To analyze the question, the "context" stage requires a huge amount of computation. The "decode" stage requires fast data transfer and high-speed memory. See long-horizon context.

Disaggregated GPUs

This NVIDIA Vera Rubin compute tray combines context GPUs (CPX) and Rubin GPUs. Along with switching trays, one server rack holds as many as 18 of these trays (see Vera Rubin). (Image courtesy of NVIDIA.)

misc

AI Term of the Moment

accelerated computing

Look Up Another Term

Definition: inference engine