Term of the Moment

cognitive computing


Look Up Another Term


Definition: inference engine


The part of an AI system that generates answers. The inference engine is the software people interact with when they ask ChatGPT, Grok or Gemini a question. Inference engines rely entirely on and give directions to an AI model that was previously trained and fine-tuned with data from the Internet. The model can be in the same machine or come from an AI datacenter (see AI training vs. inference).

Human Rules Were the First AI
In the 1960s, one of the first AI solutions was an "expert system" that relied on rules defined by people. Expert systems were one application, not divided into model and inference like most of today's AI. See expert system.

Disaggregated Inference: Context and Decode
To increase performance in today's AI datacenters, inference is executed in two stages, each with its own type of GPU. To analyze the user's question, the "context" stage requires a huge amount of computation. The second "decode" stage requires fast data transfer and high-speed memory to get results. See long-horizon context.




GPU Splitting: Disaggregated Solution
This NVIDIA Vera Rubin compute tray combines context GPUs (CPX) with Rubin GPUs. Along with switching trays, one server rack holds as many as 18 of these trays (see Vera Rubin). (Image courtesy of NVIDIA.)