On Monday, Nvidia revealed the HGX H200 Tensor Core GPU, leveraging the Hopper architecture to boost AI applications. This new chip is an advancement over last year’s H100 GPU, Nvidia’s previous top AI GPU. The H200’s deployment could enhance AI models like ChatGPT, offering faster response times.
Experts note that AI’s progress has been hampered by a lack of computing power, with shortages in powerful AI GPUs partly responsible. Addressing this, the H200’s increased potency could be a game-changer for cloud providers.
The H200 is designed primarily for data centers, not graphics. Its capacity for parallel matrix multiplications makes it ideal for AI, aiding both in training AI models and in inference processes.
Ian Buck, Nvidia’s VP of hyperscale and HPC, highlighted the H200’s ability to process vast data efficiently at high speeds, making it crucial for generative AI and HPC applications.
OpenAI has faced GPU resource constraints, impacting ChatGPT’s performance. The H200 could potentially alleviate these limitations, allowing for expanded service capabilities.
The H200 stands out with its first-to-market HBM3e memory, offering 141GB memory and 4.8 terabytes/second bandwidth, substantially surpassing the A100’s capabilities. It will be available in various forms, including on Nvidia HGX H200 server boards and in the Nvidia GH200 Grace Hopper Superchip, blending CPU and GPU.
Cloud giants like Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure plan to deploy H200-based instances from next year, with broader availability expected in Q2 2024.
Amid this, Nvidia navigates US government export restrictions, affecting sales to China and Russia. The company has responded by introducing new chips for the Chinese market, which makes up a significant portion of its data center chip revenue. This ongoing situation indicates a continuous strategic play between the US government and Nvidia.