As generative AI and large language models (LLMs) continue to drive innovations, compute requirements for training and inference have grown at an astonishing pace.
To meet that need, Google Cloud announced the general availability of its new A3 instances, powered by NVIDIA H100 Tensor Core GPUs. These GPUs bring unprecedented performance to all kinds of AI applications with their Transformer Engine — purpose-built to accelerate LLMs.
NVIDIA named Google Cloud’s Generative AI Partner of the Year. A3 instances now available, showcasing their collaboration to boost generative AI on Google Cloud.
The joint effort takes multiple forms, from infrastructure design to extensive software enablement, to make it easier to build and deploy AI applications on the Google Cloud platform.
At the Google Cloud Next conference, NVIDIA founder and CEO Jensen Huang joined Google Cloud CEO Thomas Kurian for the event keynote. Firstly, they celebrated the general availability of NVIDIA H100 GPU-powered A3 instances. Additionally, they spoke about how Google is using NVIDIA H100 and A100 GPUs for internal research and inference in its DeepMind and other divisions.
During the discussion, Huang pointed to the deeper levels of collaboration that enabled NVIDIA GPU acceleration for the PaxML framework for creating massive LLMs. This Jax-based machine learning framework is purpose-built to train large-scale models, allowing advanced and fully configurable experimentation and parallelization.
PaxML has been used by Google to build internal models, including DeepMind as well as research projects, and will use NVIDIA GPUs. The companies also announced that PaxML is available immediately on the NVIDIA NGC container registry.
Generative AI Startups Abound
Today, there are over a thousand generative AI startups building next-generation applications, many using NVIDIA technology on Google Cloud. Some notable ones include Writer and Runway.
Writer uses transformer-based LLMs to enable marketing teams to quickly create copy for web pages, blogs, ads and more. To do this, the company harnesses NVIDIA NeMo, an application framework from NVIDIA AI Enterprise. It helps companies curate their training datasets, build and customize LLMs, and run them in production at scale.
With NeMo optimizations, Writer developers scaled up from models with hundreds of millions to 40-billion parameter models. Furthermore, the startup’s customer list includes household names like Deloitte, L’Oreal, Intuit, Uber and many other Fortune 500 companies.
Runway uses AI to generate videos in any style. The AI model imitates specific styles prompted by given images or through a text prompt. Users can also use the model to create new video content using existing footage. This flexibility enables filmmakers and content creators to explore and design videos in a whole new way.
Google Cloud was the first CSP to bring the NVIDIA L4 GPU to the cloud. In addition, the companies have collaborated to enable Google’s Dataproc service to leverage the RAPIDS Accelerator for Apache Spark. This collaboration provides significant performance boosts for ETL. The feature is available today with Dataproc on the Google Compute Engine and will soon be available for Serverless Dataproc.
Find more details about NVIDIA GPU instances on Google Cloud and how NVIDIA is powering generative AI. Sign up for generative AI news to stay up to date on the latest breakthroughs, developments and technologies. Additionally, read this technical blog to see how organizations are running their mission-critical enterprise applications with NVIDIA NeMo. They are doing this on the GPU-accelerated Google Cloud.