NVIDIA has introduced a brand new graphics processor that, it hopes, will present the computational energy required for “massive-context processing” in synthetic intelligence programs — to a claimed million-token scale.
“The Vera Rubin platform will mark one other leap within the frontier of AI computing — introducing each the next-generation Rubin GPU and a brand new class of processors referred to as CPX,” Jensen Huang, NVIDIA founder and chief govt officer, says of the corporate’s newest launch. “Simply as RTX revolutionized graphics and bodily AI, Rubin CPX is the primary CUDA GPU purpose-built for massive-context AI, the place fashions cause throughout hundreds of thousands of tokens of information directly.”
NVIDIA has unveiled a brand new class of GPU that it hopes will push LLMs, VLMs, and different fashions to a million-token context window scale: Rubin CPX. (📷: NVIDIA)
The big language fashions (LLMs) underpinning the present AI increase are statistical token manipulators: skilled on huge troves of often-illegitimately-gained knowledge, they boil every part down into “tokens” — then, when introduced with an enter immediate that itself has been changed into tokens, reply with essentially the most statistically-likely tokens by the use of continuation. If all has gone nicely, these tokens symbolize a solution to your question; in any other case, they symbolize an answer-shaped object that, the LLM being totally incapable of something resembling thought or reasoning no matter advertising and marketing departments’ claims in any other case, can have little or no resemblance to information or actuality.
The extra tokens you may present, the extra seemingly the answer-shaped token stream supplied shall be of use — however the computational complexity will increase, leaving most fashions restricted to comparatively small “context home windows.” That is the place Rubin, named for astronomer and physicist Vera Rubin, is available in, with NVIDIA claiming it offers a technique to scale LLMs and different generative AI fashions — together with picture and video era fashions, which work equally — to context home windows of as much as 1,000,000 tokens.
The Rubin CPX, NVIDIA claims, delivers as much as 30 peta floating-point operations per second (petaFLOPS) of NVFP4 precision compute, and consists of 128GB of GDDR7 reminiscence — swapping the efficiency of high-bandwidth reminiscence for the flexibility to cram extra on the board. In comparison with NVIDIA’s Grace-Blackwell GB300 NVL72 programs, the corporate says it could actually ship a tripling in consideration efficiency — a mannequin’s capacity to course of context sequences.
A rack full of 144 Rubin CPX, 144 Rubin, and 36 Vera chips will ship a claimed eight exaFLOPS of NVFP4 compute. (📷: NVIDIA)
The corporate is not anticipating anybody to utilize a single Rubin CPX, although: NVIDIA envisions the boards being mixed with non-CPX Rubin GPUs and Vera CPUs, displaying off a fully-stocked rack implementation dubbed the Vera Rubin NVL144 CPX — a mix of 144 Rubin CPX GPUs, 144 plain Rubin CPUs, and 36 Vera CPUs for a complete of eight exaFLOPS of NVFP4 compute. Whereas that is unlikely to be low-cost, NVIDIA makes a daring declare of profitability: $100 million spent on its Rubin-based {hardware} might ship, the corporate claims, “as a lot as” $5 billion in income.
Extra data on the Rubin CPX is obtainable on the NVIDIA Developer Technical Weblog; {hardware} is anticipated to develop into accessible on the finish of subsequent yr — at an as-yet introduced value level.
