NVIDIA and Microsoft today unveiled product integrations designed to advance full-stack NVIDIA AI development on Microsoft platforms and applications.
At Microsoft Ignite, Microsoft announced the launch of the first cloud private preview of the Azure ND GB200 V6 VM series, based on the NVIDIA Blackwell platform. The Azure ND GB200 v6 will be a new AI-optimized virtual machine (VM) series and combines the NVIDIA GB200 NVL72 rack design with NVIDIA Quantum InfiniBand networking.
In addition, Microsoft revealed that Azure Container Apps now supports NVIDIA GPUs, enabling simplified and scalable AI deployment. Plus, the NVIDIA AI platform on Azure includes new reference workflows for industrial AI and an NVIDIA Omniverse Blueprint for creating immersive, AI-powered visuals.
At Ignite, NVIDIA also announced multimodal small language models (SLMs) for RTX AI PCs and workstations, enhancing digital human interactions and virtual assistants with greater realism.
Microsoft’s new Azure ND GB200 V6 VM series will harness the powerful performance of NVIDIA GB200 Grace Blackwell Superchips, coupled with advanced NVIDIA Quantum InfiniBand networking. This offering is optimized for large-scale deep learning workloads to accelerate breakthroughs in natural language processing, computer vision and more.
The Blackwell-based VM series complements previously announced Azure AI clusters with ND H200 V5 VMs, which provide increased high-bandwidth memory for improved AI inferencing. The ND H200 V5 VMs are already being used by OpenAI to enhance ChatGPT.
Serverless computing provides AI application developers increased agility to rapidly deploy, scale and iterate on applications without worrying about underlying infrastructure. This enables them to focus on optimizing models and improving functionality while minimizing operational overhead.
The Azure Container Apps serverless containers platform simplifies deploying and managing microservices-based applications by abstracting away the underlying infrastructure.
Azure Container Apps now supports NVIDIA-accelerated workloads with serverless GPUs, allowing developers to use the power of accelerated computing for real-time AI inference applications in a flexible, consumption-based, serverless environment. This capability simplifies AI deployments at scale while improving resource efficiency and application performance without the burden of infrastructure management.
Serverless GPUs allow development teams to focus more on innovation and less on infrastructure management. With per-second billing and scale-to-zero capabilities, customers pay only for the compute they use, helping ensure resource utilization is both economical and efficient. NVIDIA is also working with Microsoft to bring NVIDIA NIM microservices to serverless NVIDIA GPUs in Azure to optimize AI model performance.
NVIDIA announced reference workflows that help developers to build 3D simulation and digital twin applications on NVIDIA Omniverse and Universal Scene Description (OpenUSD) — accelerating industrial AI and advancing AI-driven creativity.
A reference workflow for 3D remote monitoring of industrial operations is coming soon to enable developers to connect physically accurate 3D models of industrial systems to real-time data from Azure IoT Operations and Power BI.
These two Microsoft services integrate with applications built on NVIDIA Omniverse and OpenUSD to provide solutions for industrial IoT use cases. This helps remote operations teams accelerate decision-making and optimize processes in production facilities.
The Omniverse Blueprint for precise visual generative AI enables developers to create applications that let nontechnical teams generate AI-enhanced visuals while preserving brand assets. The blueprint supports models like SDXL and Shutterstock Generative 3D to streamline the creation of on-brand, AI-generated images.
Leading creative groups, including Accenture Song, Collective, GRIP, Monks and WPP, have adopted this NVIDIA Omniverse Blueprint to personalize and customize imagery across markets.
NVIDIA’s collaboration with Microsoft extends to bringing AI capabilities to personal computing devices.
At Ignite, NVIDIA announced its new multimodal SLM, NVIDIA Nemovision-4B Instruct, for understanding visual imagery in the real world and on screen. It’s coming soon to RTX AI PCs and workstations and will pave the way for more sophisticated and lifelike digital human interactions.
Plus, updates to NVIDIA TensorRT Model Optimizer (ModelOpt) offer Windows developers a path to optimize a model for ONNX Runtime deployment. TensorRT ModelOpt enables developers to create AI models for PCs that are faster and more accurate when accelerated by RTX GPUs. This enables large models to fit within the constraints of PC environments, while making it easy for developers to deploy across the PC ecosystem with ONNX runtimes.
RTX AI-enabled PCs and workstations offer enhanced productivity tools, creative applications and immersive experiences powered by local AI processing.
NVIDIA’s extensive ecosystem of partners and developers brings a wealth of AI and high-performance computing options to the Azure platform.
SoftServe, a global IT consulting and digital services provider, today announced the availability of SoftServe Gen AI Industrial Assistant, based on the NVIDIA AI Blueprint for multimodal PDF data extraction, on the Azure marketplace. The assistant addresses critical challenges in manufacturing by using AI to enhance equipment maintenance and improve worker productivity.
At Ignite, AT&T will showcase how it’s using NVIDIA AI and Azure to enhance operational efficiency, boost employee productivity and drive business growth through retrieval-augmented generation and autonomous assistants and agents.
Learn more about NVIDIA and Microsoft’s collaboration and sessions at Ignite.
See notice regarding software product information.
]]>Cloud-native technologies have become crucial for developers to create and implement scalable applications in dynamic cloud environments.
This week at KubeCon + CloudNativeCon North America 2024, one of the most-attended conferences focused on open-source technologies, Chris Lamb, vice president of computing software platforms at NVIDIA, delivered a keynote outlining the benefits of open source for developers and enterprises alike — and NVIDIA offered nearly 20 interactive sessions with engineers and experts.
The Cloud Native Computing Foundation (CNCF), part of the Linux Foundation and host of KubeCon, is at the forefront of championing a robust ecosystem to foster collaboration among industry leaders, developers and end users.
As a member of CNCF since 2018, NVIDIA is working across the developer community to contribute to and sustain cloud-native open-source projects. Our open-source software and more than 750 NVIDIA-led open-source projects help democratize access to tools that accelerate AI development and innovation.
NVIDIA has benefited from the many open-source projects under CNCF and has made contributions to dozens of them over the past decade. These actions help developers as they build applications and microservice architectures aligned with managing AI and machine learning workloads.
Kubernetes, the cornerstone of cloud-native computing, is undergoing a transformation to meet the challenges of AI and machine learning workloads. As organizations increasingly adopt large language models and other AI technologies, robust infrastructure becomes paramount.
NVIDIA has been working closely with the Kubernetes community to address these challenges. This includes:
The company’s open-source efforts extend beyond Kubernetes to other CNCF projects:
And NVIDIA has assisted with projects that address the observability, performance and other critical areas of cloud-native computing, such as:
NVIDIA engages the cloud-native ecosystem by participating in CNCF events and activities, including:
This translates into extended benefits for developers, such as improved efficiency in managing AI and ML workloads; enhanced scalability and performance of cloud-native applications; better resource utilization, which can lead to cost savings; and simplified deployment and management of complex AI infrastructures.
As AI and machine learning continue to transform industries, NVIDIA is helping advance cloud-native technologies to support compute-intensive workloads. This includes facilitating the migration of legacy applications and supporting the development of new ones.
These contributions to the open-source community help developers harness the full potential of AI technologies and strengthen Kubernetes and other CNCF projects as the tools of choice for AI compute workloads.
Check out NVIDIA’s keynote at KubeCon + CloudNativeCon North America 2024 delivered by Chris Lamb, where he discusses the importance of CNCF projects in building and delivering AI in the cloud and NVIDIA’s contributions to the community to push the AI revolution forward.
]]>Consulting giants including Accenture, Deloitte, EY Strategy and Consulting Co., Ltd. (or EY Japan), FPT, Kyndryl and Tata Consultancy Services Japan (TCS Japan) are working with NVIDIA to establish innovation centers in Japan to accelerate the nation’s goal of embracing enterprise AI and physical AI across its industrial landscape.
The centers will use NVIDIA AI Enterprise software, local language models and NVIDIA NIM microservices to help clients in Japan advance the development and deployment of AI agents tailored to their industries’ respective needs, boosting productivity with a digital workforce.
Using the NVIDIA Omniverse platform, Japanese firms can develop digital twins and simulate complex physical AI systems, driving innovation in manufacturing, robotics and other sectors.
Like many nations, Japan is navigating complex social and demographic challenges, which is leading to a smaller workforce as older generations retire. Leaning into its manufacturing and robotics leadership, the country is seeking opportunities to solve these challenges using AI.
The Japanese government in April published a paper on its aims to become “the world’s most AI-friendly country.” AI adoption is strong and growing, as IDC reports that the Japanese AI systems market reached approximately $5.9 billion this year, with a year-on-year growth rate of 31.2%.1
The consulting giants’ initiatives and activities include:
Located in the Tokyo and Kansai metropolitan areas, these new consulting centers offer hands-on experience with NVIDIA’s latest technologies and expert guidance — helping accelerate AI transformation, solve complex social challenges and support the nation’s economic growth.
To learn more, watch the NVIDIA AI Summit Japan fireside chat with NVIDIA founder and CEO Jensen Huang.
Editor’s note: IDC figures are sourced to IDC, 2024 Domestic AI System Market Forecast Announced, April 2024. The IDC forecast amount was converted to USD by NVIDIA, while the CAGR (31.2%) was calculated based on JPY.
]]>Artificial intelligence will be the driving force behind India’s digital transformation, fueling innovation, economic growth, and global leadership, NVIDIA founder and CEO Jensen Huang said Thursday at NVIDIA’s AI Summit in Mumbai.
Addressing a crowd of entrepreneurs, developers, academics and business leaders, Huang positioned AI as the cornerstone of the country’s future.
India has an “amazing natural resource” in its IT and computer science expertise, Huang said, noting the vast potential waiting to be unlocked.
To capitalize on this country’s talent and India’s immense data resources, the country’s leading cloud infrastructure providers are rapidly accelerating their data center capacity. NVIDIA is playing a key role, with NVIDIA GPU deployments expected to grow nearly 10x by year’s end, creating the backbone for an AI-driven economy.
Together with NVIDIA, these companies are at the cutting edge of a shift Huang compared to the seismic change in computing introduced by IBM’s System 360 in 1964, calling it the most profound platform shift since then.
“This industry, the computing industry, is going to become the intelligence industry,” Huang said, pointing to India’s unique strengths to lead this industry, thanks to its enormous amounts of data and large population.
With this rapid expansion in infrastructure, AI factories will play a critical role in India’s future, serving as the backbone of the nation’s AI-driven growth.
“It makes complete sense that India should manufacture its own AI,” Huang said. “You should not export data to import intelligence,” he added, noting the importance of India building its own AI infrastructure.
Huang identified three areas where AI will transform industries: sovereign AI, where nations use their own data to drive innovation; agentic AI, which automates knowledge-based work; and physical AI, which applies AI to industrial tasks through robotics and autonomous systems. India, Huang noted, is uniquely positioned to lead in all three areas.
India’s startups are already harnessing NVIDIA technology to drive innovation across industries and are positioning themselves as global players, bringing the country’s AI solutions to the world.
Meanwhile, India’s robotics ecosystem is adopting NVIDIA Isaac and Omniverse to power the next generation of physical AI, revolutionizing industries like manufacturing and logistics with advanced automation.
Huang’s also keynote featured a surprise appearance by actor and producer Akshay Kumar.
Following Huang’s remarks, the focus shifted to a fireside chat between Huang and Reliance Industries Chairman Mukesh Ambani, where the two leaders explored how AI will shape the future of Indian industries, particularly in sectors like energy, telecommunications and manufacturing.
Ambani emphasized that AI is central to this continued growth. Reliance, in partnership with NVIDIA, is building AI factories to automate industrial tasks and transform processes in sectors like energy and manufacturing.
Both men discussed their companies’ joint efforts to pioneer AI infrastructure in India.
Ambani underscored the role of AI in public sector services, explaining how India’s data combined with AI is already transforming governance and service delivery.
Huang added that AI promises to democratize technology.
“The ability to program AI is something that everyone can do … if AI could be put into the hands of every citizen, it would elevate and put into the hands of everyone this incredible capability,” he said.
Huang emphasized NVIDIA’s role in preparing India’s workforce for an AI-driven future.
NVIDIA is partnering with India’s IT giants such as Infosys, TCS, Tech Mahindra and Wipro to upskill nearly half a million developers, ensuring India leads the AI revolution with a highly trained workforce.
“India’s technical talent is unmatched,” Huang said.
Ambani echoed these sentiments, stressing that “India will be one of the biggest intelligence markets,” pointing to the nation’s youthful, technically talented population.
As the session drew to a close, Huang and Ambani reflected on their vision for India’s AI-driven future.
With its vast talent pool, burgeoning tech ecosystem and immense data resources, the country, they agreed, has the potential to contribute globally in sectors such as energy, healthcare, finance and manufacturing.
“This cannot be done by any one company, any one individual, but we all have to work together to bring this intelligence age safely to the world so that we can create a more equal world, a more prosperous world,” Ambani said.
Huang echoed the sentiment, adding: “Let’s make it a promise today that we will work together so that India can take advantage of the intelligence revolution that’s ahead of us.”
]]>India’s leading cloud infrastructure providers and server manufacturers are ramping up accelerated data center capacity. By year’s end, they’ll have boosted NVIDIA GPU deployment in the country by nearly 10x compared to 18 months ago.
Tens of thousands of NVIDIA Hopper GPUs will be added to build AI factories — large-scale data centers for producing AI — that support India’s large businesses, startups and research centers running AI workloads in the cloud and on premises. This will cumulatively provide nearly 180 exaflops of compute to power innovation in healthcare, financial services and digital content creation.
Announced today at the NVIDIA AI Summit, taking place in Mumbai through Oct. 25, this buildout of accelerated computing technology is led by data center provider Yotta Data Services, global digital ecosystem enabler Tata Communications, cloud service provider E2E Networks and original equipment manufacturer Netweb.
Their systems will enable developers to harness domestic data center resources powerful enough to fuel a new wave of large language models, complex scientific visualizations and industrial digital twins that could propel India to the forefront of AI-accelerated innovation.
Yotta Data Services is providing Indian businesses, government departments and researchers access to managed cloud services through its Shakti Cloud platform to boost generative AI adoption and AI education.
Powered by thousands of NVIDIA Hopper GPUs, these computing resources are complemented by NVIDIA AI Enterprise, an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade copilots and other generative AI applications.
With NVIDIA AI Enterprise, Yotta customers can access NVIDIA NIM, a collection of microservices for optimized AI inference, and NVIDIA NIM Agent Blueprints, a set of customizable reference architectures for generative AI applications. This will allow them to rapidly adopt optimized, state-of-the-art AI for applications including biomolecular generation, virtual avatar creation and language generation.
“The future of AI is about speed, flexibility and scalability, which is why Yotta’s Shakti Cloud platform is designed to eliminate the common barriers that organizations across industries face in AI adoption,” said Sunil Gupta, cofounder, CEO and managing director of Yotta. “Shakti Cloud brings together high-performance GPUs, optimized storage and a services layer that simplifies AI development from model training to deployment, so organizations can quickly scale their AI efforts, streamline operations and push the boundaries of what AI can accomplish.”
Yotta’s customers include Sarvam AI, which is building AI models that support major Indian languages; Innoplexus, which is developing an AI-powered life sciences platform for drug discovery; and Zoho Corporation, which is creating language models for enterprise customers.
Tata Communications is initiating a large-scale deployment of NVIDIA Hopper architecture GPUs to power its public cloud infrastructure and support a wide range of AI applications. The company also plans to expand its offerings next year to include NVIDIA Blackwell GPUs.
In addition to providing accelerated hardware, Tata Communications will enable customers to run NVIDIA AI Enterprise, including NVIDIA NIM and NIM Agent Blueprints, and NVIDIA Omniverse, a software platform and operating system that developers use to build physical AI and robotic system simulation applications.
“By combining NVIDIA’s accelerated computing infrastructure with Tata Communications’ AI Studio and global network, we’re creating a future-ready platform that will enable AI transformation across industries,” said A.S. Lakshminarayanan, managing director and CEO of Tata Communications. “Access to these resources will make AI more accessible to innovators in fields including manufacturing, healthcare, retail, banking and financial services.”
E2E Networks supports enterprises in India, the Middle East, the Asia-Pacific region and the U.S with GPU-powered cloud servers.
It offers customers access to clusters featuring NVIDIA Hopper GPUs interconnected with NVIDIA Quantum-2 InfiniBand networking to help meet the demand for high-compute tasks including simulations, foundation model training and real-time AI inference.
“This infrastructure expansion helps ensure Indian businesses have access to high-performance, scalable infrastructure to develop custom AI models,” said Tarun Dua, cofounder and managing director of E2E Networks. “NVIDIA Hopper GPUs will be a powerful driver of innovation in large language models and large vision models for our users.”
E2E’s clients include AI4Bharat, a research lab at the Indian Institute of Technology Madras developing open-source AI applications for Indian languages — as well as members of the NVIDIA Inception startup program such as disease detection company Qure.ai, text-to-video generative AI company Invideo AI and intelligent voice agent company Assisto.
Netweb is expanding its range of Tyrone AI systems based on NVIDIA MGX, a modular reference architecture to accelerate enterprise data center workloads.
Offered for both on-premises and off-premises cloud infrastructure, the new servers feature NVIDIA GH200 Grace Hopper Superchips, delivering the computational power to support large hyperscalers, research centers, enterprises and supercomputing centers in India and across Asia.
“Through Netweb’s decade-long collaboration with NVIDIA, we’ve shown that world-class computing infrastructure can be developed in India,” said Sanjay Lodha, chairman and managing director of Netweb. “Our next-generation systems will help the country’s businesses and researchers build and deploy more complex AI applications trained on proprietary datasets.”
Netweb also offers customers Tryone Skylus cloud instances that include the company’s full software stack, alongside the NVIDIA AI Enterprise and NVIDIA Omniverse software platforms, to develop large-scale agentic AI and physical AI.
NVIDIA’s roadmap features new platforms set to arrive on a one-year rhythm. By harnessing these advancements in AI computing and networking, infrastructure providers and manufacturers in India and beyond will be able to further scale the capabilities of AI development to power larger, multimodal models, optimize inference performance and train the next generation of AI applications.
Learn more about India’s AI adoption in the fireside chat between NVIDIA founder and CEO Jensen Huang and Mukesh Ambani, chairman and managing director of Reliance Industries, at the NVIDIA AI Summit.
]]>NVIDIA is expanding its collaboration with Microsoft to support global AI startups across industries — with an initial focus on healthcare and life sciences companies.
Announced today at the HLTH healthcare innovation conference, the initiative connects the startup ecosystem by bringing together the NVIDIA Inception global program for cutting-edge startups and Microsoft for Startups to broaden innovators’ access to accelerated computing by providing cloud credits, software for AI development and the support of technical and business experts.
The first phase will focus on high-potential digital health and life sciences companies that are part of both programs. Future phases will focus on startups in other industries.
Microsoft for Startups will provide each company with $150,000 of Microsoft Azure credits to access leading AI models, up to $200,000 worth of Microsoft business tools, and priority access to its Pegasus Program for go-to-market support.
NVIDIA Inception will provide 10,000 ai.nvidia.com inference credits to run GPU-optimized AI models through NVIDIA-managed serverless APIs; preferred pricing on NVIDIA AI Enterprise, which includes the full suite of NVIDIA Clara healthcare and life sciences computing platforms, software and services; early access to new NVIDIA healthcare offerings; and opportunities to connect with investors through the Inception VC Alliance and with industry partners through the Inception Alliance for Healthcare.
Both companies will provide the selected startups with dedicated technical support and hands-on workshops to develop digital health applications with the NVIDIA technology stack on Azure.
Hundreds of companies are already part of both NVIDIA Inception and Microsoft for Startups, using the combination of accelerated computing infrastructure and cutting-edge AI to advance their work.
Artisight, for example, is a smart hospital startup using AI to improve operational efficiency, documentation and care coordination in order to reduce the administrative burden on clinical staff and improve the patient experience. Its smart hospital network includes over 2,000 cameras and microphones at Northwestern Medicine, in Chicago, and over 200 other hospitals.
The company uses speech recognition models that can automate patient check-in with voice-enabled kiosks and computer vision models that can alert nurses when a patient is at risk of falling. Its products use software including NVIDIA Riva for conversational AI, NVIDIA DeepStream for vision AI and NVIDIA Triton Inference server to simplify AI inference in production.
“Access to the latest AI technologies is critical to developing smart hospital solutions that are reliable enough to be deployed in real-world clinical settings,” said Andrew Gostine, founder and CEO of Artisight. “The support of NVIDIA Inception and Microsoft for Startups has enabled our company to scale our products to help top U.S. hospitals care for thousands of patients.”
Another company, Pangaea Data, is helping healthcare organizations and pharmaceutical companies identify patients who remain undertreated or untreated despite available intelligence in their existing medical records. The company’s PALLUX platform supports clinicians at the point of care by finding more patients for screening and treatment. Deployed with NVIDIA GPUs on Azure’s HIPAA-compliant, secure cloud environment, PALLUX uses the NVIDIA FLARE federated learning framework to preserve patient privacy while driving improvement in health outcomes.
PALLUX helped one healthcare provider find 6x more cancer patients with cachexia — a condition characterized by loss of weight and muscle mass — for treatment and clinical trials. Pangaea Data’s platform achieved 90% accuracy and was deployed on the provider’s existing infrastructure within 12 weeks.
“By building our platform on a trusted cloud environment, we’re offering healthcare providers and pharmaceutical companies a solution to uncover insights from existing health records and realize the true promise of precision medicine and preventative healthcare,” said Pangaea Data CEO Vibhor Gupta. “Microsoft and NVIDIA have supported our work with powerful virtual machines and AI software, enabling us to focus on advancing our platform, rather than infrastructure management.”
Other startups participating in both programs and using NVIDIA GPUs on Azure include:
Microsoft earlier this year announced a collaboration with NVIDIA to boost healthcare and life sciences organizations with generative AI, accelerated computing and the cloud.
Aimed at supporting projects in clinical research, drug discovery, medical imaging and precision medicine, this collaboration brought together Microsoft Azure with NVIDIA DGX Cloud, an end-to-end, scalable AI platform for developers.
It also provides users of NVIDIA DGX Cloud on Azure access to NVIDIA Clara, including domain-specific resources such as NVIDIA BioNeMo, a generative AI platform for drug discovery; NVIDIA MONAI, a suite of enterprise-grade AI for medical imaging; and NVIDIA Parabricks, a software suite designed to accelerate processing of sequencing data for genomics applications.
Join the Microsoft for Startups Founders Hub and the NVIDIA Inception program.
]]>AI can help solve some of the world’s biggest challenges — whether climate change, cancer or national security — U.S. Secretary of Energy Jennifer Granholm emphasized today during her remarks at the AI for Science, Energy and Security session at the NVIDIA AI Summit, in Washington, D.C.
Granholm went on to highlight the pivotal role AI is playing in tackling major national challenges, from energy innovation to bolstering national security.
“We need to use AI for both offense and defense — offense to solve these big problems and defense to make sure the bad guys are not using AI for nefarious purposes,” she said.
Granholm, who calls the Department of Energy “America’s Solutions Department,” highlighted the agency’s focus on solving the world’s biggest problems.
“Yes, climate change, obviously, but a whole slew of other problems, too … quantum computing and all sorts of next-generation technologies,” she said, pointing out that AI is a driving force behind many of these advances.
“AI can really help to solve some of those huge problems — whether climate change, cancer or national security,” she said. “The possibilities of AI for good are awesome, awesome.”
Following Granholm’s 15-minute address, a panel of experts from government, academia and industry took the stage to further discuss how AI accelerates advancements in scientific discovery, national security and energy innovation.
“AI is going to be transformative to our mission space.… We’re going to see these big step changes in capabilities,” said Helena Fu, director of the Office of Critical and Emerging Technologies at the Department of Energy, underscoring AI’s potential in safeguarding critical infrastructure and addressing cyber threats.
During her remarks, Granholm also stressed that AI’s increasing energy demands must be met responsibly.
“We are going to see about a 15% increase in power demand on our electric grid as a result of the data centers that we want to be located in the United States,” she explained.
However, the DOE is taking steps to meet this demand with clean energy.
“This year, in 2024, the United States will have added 30 Hoover Dams’ worth of clean power to our electric grid,” Granholm announced, emphasizing that the clean energy revolution is well underway.
The discussion then shifted to how AI is revolutionizing scientific research and national security.
Tanya Das, director of the Energy Program at the Bipartisan Policy Center, pointed out that “AI can accelerate every stage of the innovation pipeline in the energy sector … starting from scientific discovery at the very beginning … going through to deployment and permitting.”
Das also highlighted the growing interest in Congress to support AI innovations, adding, “Congress is paying attention to this issue, and, I think, very motivated to take action on updating what the national vision is for artificial intelligence.”
Fu reiterated the department’s comprehensive approach, stating, “We cross from open science through national security, and we do this at scale.… Whether they be around energy security, resilience, climate change or the national security challenges that we’re seeing every day emerging.”
She also touched on the DOE’s future goals: “Our scientific systems will need access to AI systems,” Fu said, emphasizing the need to bridge both scientific reasoning and the new kinds of models we’ll need to develop for AI.
Karthik Duraisamy, director of the Michigan Institute for Computational Discovery and Engineering at the University of Michigan, highlighted the power of collaboration in advancing scientific research through AI.
“Think about the scientific endeavor as 5% creativity and innovation and 95% intense labor. AI amplifies that 5% by a bit, and then significantly accelerates the 95% part,” Duraisamy explained. “That is going to completely transform science.”
Duraisamy further elaborated on the role AI could play as a persistent collaborator, envisioning a future where AI can work alongside scientists over weeks, months and years, generating new ideas and following through on complex projects.
“Instead of replacing graduate students, I think graduate students can be smarter than the professors on day one,” he said, emphasizing the potential for AI to support long-term research and innovation.
Learn more about how this week’s AI Summit highlights how AI is shaping the future across industries and how NVIDIA’s solutions are laying the groundwork for continued innovation.
]]>Accelerated computing is sustainable computing, Bob Pette, NVIDIA’s vice president and general manager of enterprise platforms, explained in a keynote at the NVIDIA AI Summit on Tuesday in Washington, D.C.
NVIDIA’s accelerated computing isn’t just efficient. It’s critical to the next wave of industrial, scientific and healthcare transformations.
“We are in the dawn of a new industrial revolution,” Pette told an audience of policymakers, press, developers and entrepreneurs gathered for the event. “I’m just here to tell you that we’re designing our systems with not just performance in mind, but with energy efficiency in mind.”
NVIDIA’s Blackwell platform has achieved groundbreaking energy efficiency in AI computing, reducing energy consumption by up to 2,000x over the past decade for training models like GPT-4.
NVIDIA accelerated computing is cutting energy use for token generation — the output from AI models — by 100,000x, underscoring the value of accelerated computing for sustainability amid the rapid adoption of AI worldwide.
“These AI factories produce product. Those products are tokens, tokens are intelligence, and Intelligence is money,” Pette said. That’s what “will revolutionize every industry on this planet.”
NVIDIA’s CUDA libraries, which have been fundamental in enabling breakthroughs across industries, now power over 4,000 accelerated applications, Pette explained.
“CUDA enables acceleration…. It also turns out to be one of the most impressive ways to reduce energy consumption,” Pette said.
These libraries are central to the company’s energy-efficient AI innovations driving significant performance gains while minimizing power consumption.
Pette also detailed how NVIDIA’s AI software helps organizations deploy AI solutions quickly and efficiently, enabling businesses to innovate faster and solve complex problems across sectors.
Pette discussed the concept of agentic AI, which goes beyond traditional AI by enabling intelligent agents to perceive, reason and act autonomously.
Agentic AI is capable of ”reasoning, of learning, and taking action,” Pette said. It’s transforming industries like manufacturing, customer service, and healthcare,” Pette said.
These AI agents are transforming industries by automating complex tasks and accelerating innovation in sectors like manufacturing, customer service and healthcare, he explained.
He also described how AI agents empower businesses to drive innovation in healthcare, manufacturing, scientific research and climate modeling.
With agentic AI, “you can do in minutes what used to take days,” Pette said.
NVIDIA, in collaboration with its partners, is tackling some of the world’s greatest challenges, including improving diagnostics and healthcare delivery, advancing climate modeling efforts and even helping find signs of life beyond our planet.
NVIDIA is collaborating with SETI to conduct real-time AI searches for fast radio bursts from distant galaxies, helping continue the exploration of space, Pette said.
Pette emphasized that NVIDIA is unlocking a $10 trillion opportunity in healthcare.
Through AI, NVIDIA is accelerating innovations in diagnostics, drug discovery and medical imaging, helping transform patient care worldwide.
Solutions like the NVIDIA Clara medical imaging platform are revolutionizing diagnostics, Parabricks is enabling breakthroughs in genomics research and the MONAI AI framework is advancing medical imaging capabilities.
Pette highlighted partnerships with leading institutions, including Carnegie Mellon University and the University of Pittsburgh, fostering AI innovation and development.
Pette also described how NVIDIA’s collaboration with federal agencies illustrates the importance of public-private partnerships in advancing AI-driven solutions in healthcare, climate modeling and national security.
Pette also announced that a new NVIDIA NIM Agent Blueprint supports cybersecurity advancements, enabling industries to safeguard critical infrastructure with AI-driven solutions.
In cybersecurity, Pette highlighted the NVIDIA NIM Agent Blueprint, a powerful tool enabling organizations to safeguard critical infrastructure through real-time threat detection and analysis.
This blueprint reduces threat response times from days to seconds, representing a significant leap forward in protecting industries.
“Agentic systems can access tools and reason through full lines of thought to provide instant one-click assessments,” Pette said. “This boosts productivity by allowing security analysts to focus on the most critical tasks while AI handles the heavy lifting of analysis, delivering fast and actionable insights.
NVIDIA’s accelerated computing solutions are advancing climate research by enabling more accurate and faster climate modeling. This technology is helping scientists tackle some of the most urgent environmental challenges, from monitoring global temperatures to predicting natural disasters.
Pette described how the NVIDIA Earth 2 platform enables climate experts to import data from multiple sources, fusing them together for analysis using Nvidia Omniverse. “NVIDIA Earth 2 brings together the power of simulation AI and visualization to empower the climate, tech ecosystem,” Pette said.
Following Pette’s keynote, Greg Estes, NVIDIA’s vice president of corporate marketing and developer programs, underscored the company’s dedication to workforce training through initiatives like the NVIDIA AI Tech Community.
And through its Deep Learning Institute, NVIDIA has already trained more than 600,000 people worldwide, equipping the next generation with the critical skills to navigate and lead in the AI-driven future.
Throughout the week, industry leaders are exploring AI’s role in solving critical issues in fields like cybersecurity and sustainability.
Upcoming sessions will feature U.S. Secretary of Energy Jennifer Granholm, who will discuss how AI is advancing energy innovation and scientific discovery.
Other speakers will address AI’s role in climate monitoring and environmental management, further showcasing the technology’s ability to address global sustainability challenges.
Learn more about how this week’s AI Summit highlights how AI is shaping the future across industries and how NVIDIA’s solutions are laying the groundwork for continued innovation.
Enterprises are looking for increasingly powerful compute to support their AI workloads and accelerate data processing. The efficiency gained can translate to better returns for their investments in AI training and fine-tuning, and improved user experiences for AI inference.
At the Oracle CloudWorld conference today, Oracle Cloud Infrastructure (OCI) announced the first zettascale OCI Supercluster, accelerated by the NVIDIA Blackwell platform, to help enterprises train and deploy next-generation AI models using more than 100,000 of NVIDIA’s latest-generation GPUs.
OCI Superclusters allow customers to choose from a wide range of NVIDIA GPUs and deploy them anywhere: on premises, public cloud and sovereign cloud. Set for availability in the first half of next year, the Blackwell-based systems can scale up to 131,072 Blackwell GPUs with NVIDIA ConnectX-7 NICs for RoCEv2 or NVIDIA Quantum-2 InfiniBand networking to deliver an astounding 2.4 zettaflops of peak AI compute to the cloud. (Read the press release to learn more about OCI Superclusters.)
At the show, Oracle also previewed NVIDIA GB200 NVL72 liquid-cooled bare-metal instances to help power generative AI applications. The instances are capable of large-scale training with Quantum-2 InfiniBand and real-time inference of trillion-parameter models within the expanded 72-GPU NVIDIA NVLink domain, which can act as a single, massive GPU.
This year, OCI will offer NVIDIA HGX H200 — connecting eight NVIDIA H200 Tensor Core GPUs in a single bare-metal instance via NVLink and NVLink Switch, and scaling to 65,536 H200 GPUs with NVIDIA ConnectX-7 NICs over RoCEv2 cluster networking. The instance is available to order for customers looking to deliver real-time inference at scale and accelerate their training workloads. (Read a blog on OCI Superclusters with NVIDIA B200, GB200 and H200 GPUs.)
OCI also announced general availability of NVIDIA L40S GPU-accelerated instances for midrange AI workloads, NVIDIA Omniverse and visualization. (Read a blog on OCI Superclusters with NVIDIA L40S GPUs.)
For single-node to multi-rack solutions, Oracle’s edge offerings provide scalable AI at the edge accelerated by NVIDIA GPUs, even in disconnected and remote locations. For example, smaller-scale deployments with Oracle’s Roving Edge Device v2 will now support up to three NVIDIA L4 Tensor Core GPUs.
Companies are using NVIDIA-powered OCI Superclusters to drive AI innovation. Foundation model startup Reka, for example, is using the clusters to develop advanced multimodal AI models to develop enterprise agents.
“Reka’s multimodal AI models, built with OCI and NVIDIA technology, empower next-generation enterprise agents that can read, see, hear and speak to make sense of our complex world,” said Dani Yogatama, cofounder and CEO of Reka. “With NVIDIA GPU-accelerated infrastructure, we can handle very large models and extensive contexts with ease, all while enabling dense and sparse training to scale efficiently at cluster levels.”
NVIDIA received the 2024 Oracle Technology Solution Partner Award in Innovation for its full-stack approach to innovation.
Oracle Autonomous Database is gaining NVIDIA GPU support for Oracle Machine Learning notebooks to allow customers to accelerate their data processing workloads on Oracle Autonomous Database.
At Oracle CloudWorld, NVIDIA and Oracle are partnering to demonstrate three capabilities that show how the NVIDIA accelerated computing platform could be used today or in the future to accelerate key components of generative AI retrieval-augmented generation pipelines.
The first will showcase how NVIDIA GPUs can be used to accelerate bulk vector embeddings directly from within Oracle Autonomous Database Serverless to efficiently bring enterprise data closer to AI. These vectors can be searched using Oracle Database 23ai’s AI Vector Search.
The second demonstration will showcase a proof-of-concept prototype that uses NVIDIA GPUs, NVIDIA cuVS and an Oracle-developed offload framework to accelerate vector graph index generation, which significantly reduces the time needed to build indexes for efficient vector searches.
The third demonstration illustrates how NVIDIA NIM, a set of easy-to-use inference microservices, can boost generative AI performance for text generation and translation use cases across a range of model sizes and concurrency levels.
Together, these new Oracle Database capabilities and demonstrations highlight how NVIDIA GPUs can be used to help enterprises bring generative AI to their structured and unstructured data housed in or managed by an Oracle Database.
NVIDIA and Oracle are collaborating to deliver sovereign AI infrastructure worldwide, helping address the data residency needs of governments and enterprises.
Brazil-based startup Wide Labs trained and deployed Amazonia IA, one of the first large language models for Brazilian Portuguese, using NVIDIA H100 Tensor Core GPUs and the NVIDIA NeMo framework in OCI’s Brazilian data centers to help ensure data sovereignty.
“Developing a sovereign LLM allows us to offer clients a service that processes their data within Brazilian borders, giving Amazônia a unique market position,” said Nelson Leoni, CEO of Wide Labs. “Using the NVIDIA NeMo framework, we successfully trained Amazônia IA.”
In Japan, Nomura Research Institute, a leading global provider of consulting services and system solutions, is using OCI’s Alloy infrastructure with NVIDIA GPUs to enhance its financial AI platform with LLMs operating in accordance with financial regulations and data sovereignty requirements.
Communication and collaboration company Zoom will be using NVIDIA GPUs in OCI’s Saudi Arabian data centers to help support compliance with local data requirements.
And geospatial modeling company RSS-Hydro is demonstrating how its flood mapping platform — built on the NVIDIA Omniverse platform and powered by L40S GPUs on OCI — can use digital twins to simulate flood impacts in Japan’s Kumamoto region, helping mitigate the impact of climate change.
These customers are among numerous nations and organizations building and deploying domestic AI applications powered by NVIDIA and OCI, driving economic resilience through sovereign AI infrastructure.
Enterprises can accelerate task automation on OCI by deploying NVIDIA software such as NIM microservices and NVIDIA cuOpt with OCI’s scalable cloud solutions. These solutions enable enterprises to quickly adopt generative AI and build agentic workflows for complex tasks like code generation and route optimization.
NVIDIA cuOpt, NIM, RAPIDS and more are included in the NVIDIA AI Enterprise software platform, available on the Oracle Cloud Marketplace.
Join NVIDIA at Oracle CloudWorld 2024 to learn how the companies’ collaboration is bringing AI and accelerated data processing to the world’s organizations.
Register to the event to watch sessions, see demos and join Oracle and NVIDIA for the solution keynote, “Unlock AI Performance with NVIDIA’s Accelerated Computing Platform” (SOL3866), on Wednesday, Sept. 11, in Las Vegas.
]]>One of the world’s largest AI communities — comprising 4 million developers on the Hugging Face platform — is gaining easy access to NVIDIA-accelerated inference on some of the most popular AI models.
New inference-as-a-service capabilities will enable developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimization from NVIDIA NIM microservices running on NVIDIA DGX Cloud.
Announced today at the SIGGRAPH conference, the service will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production. Enterprise Hub users can tap serverless inference for increased flexibility, minimal infrastructure overhead and optimized performance with NVIDIA NIM.
The inference service complements Train on DGX Cloud, an AI training service already available on Hugging Face.
Developers facing a growing number of open-source models can benefit from a hub where they can easily compare options. These training and inference tools give Hugging Face developers new ways to experiment with, test and deploy cutting-edge models on NVIDIA-accelerated infrastructure. They’re made easily accessible using the “Train” and “Deploy” drop-down menus on Hugging Face model cards, letting users get started with just a few clicks.
Get started with inference-as-a-service powered by NVIDIA NIM.
Beyond a Token Gesture — NVIDIA NIM Brings Big Benefits
NVIDIA NIM is a collection of AI microservices — including NVIDIA AI foundation models and open-source community models — optimized for inference using industry-standard application programming interfaces, or APIs.
NIM offers users higher efficiency in processing tokens — the units of data used and generated by a language model. The optimized microservices also improve the efficiency of the underlying NVIDIA DGX Cloud infrastructure, which can increase the speed of critical AI applications.
This means developers see faster, more robust results from an AI model accessed as a NIM compared with other versions of the model. The 70-billion-parameter version of Llama 3, for example, delivers up to 5x higher throughput when accessed as a NIM compared with off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems.
Near-Instant Access to DGX Cloud Provides Accessible AI Acceleration
The NVIDIA DGX Cloud platform is purpose-built for generative AI, offering developers easy access to reliable accelerated computing infrastructure that can help them bring production-ready applications to market faster.
The platform provides scalable GPU resources that support every step of AI development, from prototype to production, without requiring developers to make long-term AI infrastructure commitments.
Hugging Face inference-as-a-service on NVIDIA DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimized for AI deployment, enabling users to experiment with the latest AI models in an enterprise-grade environment.
More on NVIDIA NIM at SIGGRAPH
At SIGGRAPH, NVIDIA also introduced generative AI models and NIM microservices for the OpenUSD framework to accelerate developers’ abilities to build highly accurate virtual worlds for the next evolution of AI.
To experience more than 100 NVIDIA NIM microservices with applications across industries, visit ai.nvidia.com.
]]>Businesses seeking to harness the power of AI need customized models tailored to their specific industry needs.
NVIDIA AI Foundry is a service that enables enterprises to use data, accelerated computing and software tools to create and deploy custom models that can supercharge their generative AI initiatives.
Just as TSMC manufactures chips designed by other companies, NVIDIA AI Foundry provides the infrastructure and tools for other companies to develop and customize AI models — using DGX Cloud, foundation models, NVIDIA NeMo software, NVIDIA expertise, as well as ecosystem tools and support.
The key difference is the product: TSMC produces physical semiconductor chips, while NVIDIA AI Foundry helps create custom models. Both enable innovation and connect to a vast ecosystem of tools and partners.
Enterprises can use AI Foundry to customize NVIDIA and open community models, including the new Llama 3.1 collection, as well as NVIDIA Nemotron, CodeGemma by Google DeepMind, CodeLlama, Gemma by Google DeepMind, Mistral, Mixtral, Phi-3, StarCoder2 and others.
Industry leaders Amdocs, Capital One, Getty Images, KT, Hyundai Motor Company, SAP, ServiceNow and Snowflake are among the first using NVIDIA AI Foundry. These pioneers are setting the stage for a new era of AI-driven innovation in enterprise software, technology, communications and media.
“Organizations deploying AI can gain a competitive edge with custom models that incorporate industry and business knowledge,” said Jeremy Barnes, vice president of AI Product at ServiceNow. “ServiceNow is using NVIDIA AI Foundry to fine-tune and deploy models that can integrate easily within customers’ existing workflows.”
NVIDIA AI Foundry is supported by the key pillars of foundation models, enterprise software, accelerated computing, expert support and a broad partner ecosystem.
Its software includes AI foundation models from NVIDIA and the AI community as well as the complete NVIDIA NeMo software platform for fast-tracking model development.
The computing muscle of NVIDIA AI Foundry is NVIDIA DGX Cloud, a network of accelerated compute resources co-engineered with the world’s leading public clouds — Amazon Web Services, Google Cloud and Oracle Cloud Infrastructure. With DGX Cloud, AI Foundry customers can develop and fine-tune custom generative AI applications with unprecedented ease and efficiency, and scale their AI initiatives as needed without significant upfront investments in hardware. This flexibility is crucial for businesses looking to stay agile in a rapidly changing market.
If an NVIDIA AI Foundry customer needs assistance, NVIDIA AI Enterprise experts are on hand to help. NVIDIA experts can walk customers through each of the steps required to build, fine-tune and deploy their models with proprietary data, ensuring the models tightly align with their business requirements.
NVIDIA AI Foundry customers have access to a global ecosystem of partners that can provide a full range of support. Accenture, Deloitte, Infosys, Tata Consultancy Services and Wipro are among the NVIDIA partners that offer AI Foundry consulting services that encompass design, implementation and management of AI-driven digital transformation projects. Accenture is first to offer its own AI Foundry-based offering for custom model development, the Accenture AI Refinery framework.
Additionally, service delivery partners such as Data Monsters, Quantiphi, Slalom and SoftServe help enterprises navigate the complexities of integrating AI into their existing IT landscapes, ensuring that AI applications are scalable, secure and aligned with business objectives.
Customers can develop NVIDIA AI Foundry models for production using AIOps and MLOps platforms from NVIDIA partners, including ActiveFence, AutoAlign, Cleanlab, DataDog, Dataiku, Dataloop, DataRobot, Deepchecks, Domino Data Lab, Fiddler AI, Giskard, New Relic, Scale, Tumeryk and Weights & Biases.
Customers can output their AI Foundry models as NVIDIA NIM inference microservices — which include the custom model, optimized engines and a standard API — to run on their preferred accelerated infrastructure.
Inferencing solutions like NVIDIA TensorRT-LLM deliver improved efficiency for Llama 3.1 models to minimize latency and maximize throughput. This enables enterprises to generate tokens faster while reducing total cost of running the models in production. Enterprise-grade support and security is provided by the NVIDIA AI Enterprise software suite.
The broad range of deployment options includes NVIDIA-Certified Systems from global server manufacturing partners including Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro, as well as cloud instances from Amazon Web Services, Google Cloud and Oracle Cloud Infrastructure.
Additionally, Together AI, a leading AI acceleration cloud, today announced it will enable its ecosystem of over 100,000 developers and enterprises to use its NVIDIA GPU-accelerated inference stack to deploy Llama 3.1 endpoints and other open models on DGX Cloud.
“Every enterprise running generative AI applications wants a faster user experience, with greater efficiency and lower cost,” said Vipul Ved Prakash, founder and CEO of Together AI. “Now, developers and enterprises using the Together Inference Engine can maximize performance, scalability and security on NVIDIA DGX Cloud.”
With NVIDIA NeMo integrated into AI Foundry, developers have at their fingertips the tools needed to curate data, customize foundation models and evaluate performance. NeMo technologies include:
Using the NeMo platform in NVIDIA AI Foundry, businesses can create custom AI models that are precisely tailored to their needs. This customization allows for better alignment with strategic objectives, improved accuracy in decision-making and enhanced operational efficiency. For instance, companies can develop models that understand industry-specific jargon, comply with regulatory requirements and integrate seamlessly with existing workflows.
“As a next step of our partnership, SAP plans to use NVIDIA’s NeMo platform to help businesses to accelerate AI-driven productivity powered by SAP Business AI,” said Philipp Herzig, chief AI officer at SAP.
Enterprises can deploy their custom AI models in production with NVIDIA NeMo Retriever NIM inference microservices. These help developers fetch proprietary data to generate knowledgeable responses for their AI applications with retrieval-augmented generation (RAG).
“Safe, trustworthy AI is a non-negotiable for enterprises harnessing generative AI, with retrieval accuracy directly impacting the relevance and quality of generated responses in RAG systems,” said Baris Gultekin, Head of AI, Snowflake. “Snowflake Cortex AI leverages NeMo Retriever, a component of NVIDIA AI Foundry, to further provide enterprises with easy, efficient, and trusted answers using their custom data.”
One of the key advantages of NVIDIA AI Foundry is its ability to address the unique challenges faced by enterprises in adopting AI. Generic AI models can fall short of meeting specific business needs and data security requirements. Custom AI models, on the other hand, offer superior flexibility, adaptability and performance, making them ideal for enterprises seeking to gain a competitive edge.
Learn more about how NVIDIA AI Foundry allows enterprises to boost productivity and innovation.
]]>If optimized AI workflows are like a perfectly tuned orchestra — where each component, from hardware infrastructure to software libraries, hits exactly the right note — then the long-standing harmony between NVIDIA and Microsoft is music to developers’ ears.
The latest AI models developed by Microsoft, including the Phi-3 family of small language models, are being optimized to run on NVIDIA GPUs and made available as NVIDIA NIM inference microservices. Other microservices developed by NVIDIA, such as the cuOpt route optimization AI, are regularly added to Microsoft Azure Marketplace as part of the NVIDIA AI Enterprise software platform.
In addition to these AI technologies, NVIDIA and Microsoft are delivering a growing set of optimizations and integrations for developers creating high-performance AI apps for PCs powered by NVIDIA GeForce RTX and NVIDIA RTX GPUs.
Building on the progress shared at NVIDIA GTC, the two companies are furthering this ongoing collaboration at Microsoft Build, an annual developer event, taking place this year in Seattle through May 23.
Microsoft is expanding its family of Phi-3 open small language models, adding small (7-billion-parameter) and medium (14-billion-parameter) models similar to its Phi-3-mini, which has 3.8 billion parameters. It’s also introducing a new 4.2-billion-parameter multimodal model, Phi-3-vision, that supports images and text.
All of these models are GPU-optimized with NVIDIA TensorRT-LLM and available as NVIDIA NIMs, which are accelerated inference microservices with a standard application programming interface (API) that can be deployed anywhere.
APIs for the NIM-powered Phi-3 models are available at ai.nvidia.com and through NVIDIA AI Enterprise on the Azure Marketplace.
NVIDIA cuOpt, a GPU-accelerated AI microservice for route optimization, is now available in Azure Marketplace via NVIDIA AI Enterprise. cuOpt features massively parallel algorithms that enable real-time logistics management for shipping services, railway systems, warehouses and factories.
The model has set two dozen world records on major routing benchmarks, demonstrating the best accuracy and fastest times. It could save billions of dollars for the logistics and supply chain industries by optimizing vehicle routes, saving travel time and minimizing idle periods.
Through Azure Marketplace, developers can easily integrate the cuOpt microservice with Azure Maps to support teal-time logistics management and other cloud-based workflows, backed by enterprise-grade management tools and security.
The NVIDIA accelerated computing platform is the backbone of modern AI — helping developers build solutions for over 100 million Windows GeForce RTX-powered PCs and NVIDIA RTX-powered workstations worldwide.
NVIDIA and Microsoft are delivering new optimizations and integrations to Windows developers to accelerate AI in next-generation PC and workstation applications. These include:
Conference attendees can visit NVIDIA booth FP28 to meet developer experts and experience live demos of NVIDIA NIM, NVIDIA cuOpt, NVIDIA Omniverse and the NVIDIA RTX AI platform. The booth also highlights the NVIDIA MONAI platform for medical imaging workflows and NVIDIA BioNeMo generative AI platform for drug discovery — both available on Azure as part of NVIDIA AI Enterprise.
Attend sessions with NVIDIA speakers to dive into the capabilities of the NVIDIA RTX AI platform on Windows PCs and discover how to deploy generative AI and digital twin tools on Microsoft Azure.
And sign up for the Developer Showcase, taking place Wednesday, to discover how developers are building innovative generative AI using NVIDIA AI software on Azure.
]]>Large language models that power generative AI are seeing intense innovation — models that handle multiple types of data such as text, image and sounds are becoming increasingly common.
However, building and deploying these models remains challenging. Developers need a way to quickly experience and evaluate models to determine the best fit for their use case, and then optimize the model for performance in a way that not only is cost-effective but offers the best performance.
To make it easier for developers to create AI-powered applications with world-class performance, NVIDIA and Google today announced three new collaborations at Google I/O ‘24.
Using TensorRT-LLM, NVIDIA is working with Google to optimize two new models it introduced at the event: Gemma 2 and PaliGemma. These models are built from the same research and technology used to create the Gemini models, and each is focused on a specific area:
Gemma 2 and PaliGemma will be offered with NVIDIA NIM inference microservices, part of the NVIDIA AI Enterprise software platform, which simplifies the deployment of AI models at scale. NIM support for the two new models are available from the API catalog, starting with PaliGemma today; they soon will be released as containers on NVIDIA NGC and GitHub.
Google also announced that RAPIDS cuDF, an open-source GPU dataframe library, is now supported by default on Google Colab, one of the most popular developer platforms for data scientists. It now takes just a few seconds for Google Colab’s 10 million monthly users to accelerate pandas-based Python workflows by up to 50x using NVIDIA L4 Tensor Core GPUs, with zero code changes.
With RAPIDS cuDF, developers using Google Colab can speed up exploratory analysis and production data pipelines. While pandas is one of the world’s most popular data processing tools due to its intuitive API, applications often struggle as their data sizes grow. With even 5-10GB of data, many simple operations can take minutes to finish on a CPU, slowing down exploratory analysis and production data pipelines.
RAPIDS cuDF is designed to solve this problem by seamlessly accelerating pandas code on GPUs where applicable, and falling back to CPU-pandas where not. With RAPIDS cuDF available by default on Colab, all developers everywhere can leverage accelerated data analytics.
By employing AI PCs using NVIDIA RTX graphics, Google and NVIDIA also announced a Firebase Genkit collaboration that enables app developers to easily integrate generative AI models, like the new family of Gemma models, into their web and mobile applications to deliver custom content, provide semantic search and answer questions. Developers can start work streams using local RTX GPUs before moving their work seamlessly to Google Cloud infrastructure.
To make this even easier, developers can build apps with Genkit using JavaScript, a programming language mobile developers commonly use to build their apps.
NVIDIA and Google Cloud are collaborating in multiple domains to propel AI forward. From the upcoming Grace Blackwell-powered DGX Cloud platform and JAX framework support, to bringing the NVIDIA NeMo framework to Google Kubernetes Engine, the companies’ full-stack partnership expands the possibilities of what customers can do with AI using NVIDIA technologies on Google Cloud.
]]>Following an announcement by Japan’s Ministry of Economy, Trade and Industry, NVIDIA will play a central role in developing the nation’s generative AI infrastructure as Japan seeks to capitalize on the technology’s economic potential and further develop its workforce.
NVIDIA is collaborating with key digital infrastructure providers, including GMO Internet Group, Highreso, KDDI Corporation, RUTILEA, SAKURA internet Inc. and SoftBank Corp., which the ministry has certified to spearhead the development of cloud infrastructure crucial for AI applications.
Over the last two months, the ministry announced plans to allocate $740 million, approximately ¥114.6 billion, to assist six local firms in this initiative. Following on from last year, this is a significant effort by the Japanese government to subsidize AI computing resources, by expanding the number of companies involved.
With this move, Japan becomes the latest nation to embrace the concept of sovereign AI, aiming to fortify its local startups, enterprises and research efforts with advanced AI technologies.
Across the globe, nations are building up domestic computing capacity through various models. Some procure and operate sovereign AI clouds with state-owned telecommunications providers or utilities. Others are sponsoring local cloud partners to provide a shared AI computing platform for public and private sector use.
Japan’s initiative follows NVIDIA founder and CEO Jensen Huang’s visit last year, where he met with political and business leaders — including Japanese Prime Minister Fumio Kishida — to discuss the future of AI.
During his trip, Huang emphasized that “AI factories” — next-generation data centers designed to handle the most computationally intensive AI tasks — are crucial for turning vast amounts of data into intelligence. “The AI factory will become the bedrock of modern economies across the world,” Huang said during a meeting with the Japanese press in December.
The Japanese government plans to subsidize a significant portion of the costs for building AI supercomputers, which will facilitate AI adoption, enhance workforce skills, support Japanese language model development and bolster resilience against natural disasters and climate change.
Under the country’s Economic Security Promotion Act, the ministry aims to secure a stable supply of local cloud services, reducing the time and cost of developing next-generation AI technologies.
Japan’s technology powerhouses are already moving fast to embrace AI. Last week, SoftBank Corp. announced that it will invest ¥150 billion, approximately $960 million, for its plan to expand the infrastructure needed to develop Japan’s top-class AI, including purchases of NVIDIA accelerated computing.
The news follows Huang’s meetings with leaders in Canada, France, India, Japan, Malaysia, Singapore and Vietnam over the past year, as countries seek to supercharge their regional economies and embrace challenges such as climate change with AI.
]]>Harnessing optimized AI models for healthcare is easier than ever as NVIDIA NIM, a collection of cloud-native microservices, integrates with Amazon Web Services.
NIM, part of the NVIDIA AI Enterprise software platform available on AWS Marketplace, enables developers to access a growing library of AI models through industry-standard application programming interfaces, or APIs. The library includes foundation models for drug discovery, medical imaging and genomics, backed by enterprise-grade security and support.
NIM is now available via Amazon SageMaker — a fully managed service to prepare data and build, train and deploy machine learning models — and AWS ParallelCluster, an open-source tool to deploy and manage high performance computing clusters on AWS. NIMs can also be orchestrated using AWS HealthOmics, a purpose-built service for biological data analysis.
Easy access to NIM will enable the thousands of healthcare and life sciences companies already using AWS to deploy generative AI more quickly, without the complexities of model development and packaging for production. It’ll also help developers build workflows that combine AI models across different modalities, such as amino acid sequences, MRI images and plain-text patient health records.
Presented today at the AWS Life Sciences Leader Symposium in Boston, this initiative extends the availability of NVIDIA Clara accelerated healthcare software and services on AWS — which include fast and easy-to-deploy NIMs from NVIDIA BioNeMo for drug discovery, NVIDIA MONAI for medical imaging workflows and NVIDIA Parabricks for accelerated genomics.
BioNeMo is a generative AI platform of foundation models, training frameworks, domain-specific data loaders and optimized training recipes that support the training and fine-tuning of biology and chemistry models on proprietary data. It’s used by more than 100 organizations globally.
Amgen, one of the world’s leading biotechnology companies, has used the BioNeMo framework to train generative models for protein design, and is exploring the potential use of BioNeMo with AWS.
BioNeMo models for protein structure prediction, generative chemistry and molecular docking prediction are available as NIM microservices, pretrained and optimized to run on any NVIDIA GPU or cluster of GPUs. These models can be combined to support a holistic, AI-accelerated drug discovery workflow.
Biotechnology company A-Alpha Bio harnesses synthetic biology and AI to measure, predict and engineer protein-to-protein interactions. When its researchers moved from a generic version of the ESM-2 protein language model to a version optimized by NVIDIA running on NVIDIA H100 Tensor Core GPUs on AWS, they immediately saw a speedup of more than 10x. This lets the team sample a much more extensive field of protein candidates than they would have otherwise.
For organizations that want to augment these models with their own experimental data, NIM enables developers to enhance a model with retrieval-augmented generation, or RAG — known as a lab-in-the-loop design.
NVIDIA NIM includes genomics models from NVIDIA Parabricks, which are also available on AWS HealthOmics as Ready2Run workflows that enable customers to deploy pre-built pipelines.
Life sciences company Agilent used Parabricks genomics analysis tools running on NVIDIA GPU-powered Amazon Elastic Compute Cloud (EC2) instances to significantly improve processing speeds for variant calling workflows on the company’s cloud-native Alissa Reporter software. Integrating Parabricks with Alissa secondary analysis pipelines enables researchers to access rapid data analysis in a secure cloud environment.
In addition to models that can decode proteins and genomic sequences, NIM microservices offer optimized large language models for conversational AI and visual generative AI models for avatars and digital humans.
AI-powered digital assistants can enhance healthcare by answering patient questions and supporting clinicians with logistics. Trained on healthcare organization-specific data using RAG, they could connect to relevant internal data sources to synthesize research, surface insights and improve productivity.
Generative AI startup Hippocratic AI is in the final stages of testing AI-powered healthcare agents that focus on a wide range of tasks including wellness coaching, preoperative outreach and post-discharge follow-up.
The company, which uses NVIDIA GPUs through AWS, is adopting NVIDIA NIM and NVIDIA ACE microservices to power a generative AI agent for digital health.
The team used NVIDIA Audio2Face facial animation technology, NVIDIA Riva automatic speech recognition and text-to-speech capabilities, and more to power a healthcare assistant avatar’s conversation.
Experiment with NVIDIA NIMs for healthcare and get started with NVIDIA Clara on AWS.
]]>To help customers make more efficient use of their AI computing resources, NVIDIA today announced it has entered into a definitive agreement to acquire Run:ai, a Kubernetes-based workload management and orchestration software provider.
Customer AI deployments are becoming increasingly complex, with workloads distributed across cloud, edge and on-premises data center infrastructure.
Managing and orchestrating generative AI, recommender systems, search engines and other workloads requires sophisticated scheduling to optimize performance at the system level and on the underlying infrastructure.
Run:ai enables enterprise customers to manage and optimize their compute infrastructure, whether on premises, in the cloud or in hybrid environments.
The company has built an open platform on Kubernetes, the orchestration layer for modern AI and cloud infrastructure. It supports all popular Kubernetes variants and integrates with third-party AI tools and frameworks.
Run:ai customers include some of the world’s largest enterprises across multiple industries, which use the Run:ai platform to manage data-center-scale GPU clusters.
“Run:ai has been a close collaborator with NVIDIA since 2020 and we share a passion for helping our customers make the most of their infrastructure,” said Omri Geller, Run:ai cofounder and CEO. “We’re thrilled to join NVIDIA and look forward to continuing our journey together.”
The Run:ai platform provides AI developers and their teams:
NVIDIA will continue to offer Run:ai’s products under the same business model for the immediate future. And NVIDIA will continue to invest in the Run:ai product roadmap, including enabling on NVIDIA DGX Cloud, an AI platform co-engineered with leading clouds for enterprise developers, offering an integrated, full-stack service optimized for generative AI.
NVIDIA HGX, DGX and DGX Cloud customers will gain access to Run:ai’s capabilities for their AI workloads, particularly for large language model deployments. Run:ai’s solutions are already integrated with NVIDIA DGX, NVIDIA DGX SuperPOD, NVIDIA Base Command, NGC containers, and NVIDIA AI Enterprise software, among other products.
NVIDIA’s accelerated computing platform and Run:ai’s platform will continue to support a broad ecosystem of third-party solutions, giving customers choice and flexibility.
Together with Run:ai, NVIDIA will enable customers to have a single fabric that accesses GPU solutions anywhere. Customers can expect to benefit from better GPU utilization, improved management of GPU infrastructure and greater flexibility from the open architecture.
]]>NVIDIA and Google Cloud have announced a new collaboration to help startups around the world accelerate the creation of generative AI applications and services.
The announcement, made today at Google Cloud Next ‘24 in Las Vegas, brings together the NVIDIA Inception program for startups and the Google for Startups Cloud Program to widen access to cloud credits, go-to-market support and technical expertise to help startups deliver value to customers faster.
Qualified members of NVIDIA Inception, a global program supporting more than 18,000 startups, will have an accelerated path to using Google Cloud infrastructure with access to Google Cloud credits — up to $350,000 for those focused on AI.
Google for Startups Cloud Program members can join NVIDIA Inception and gain access to technological expertise, NVIDIA Deep Learning Institute course credits, NVIDIA hardware and software, and more. Eligible members of the Google for Startups Cloud Program also can participate in NVIDIA Inception Capital Connect, a platform that gives startups exposure to venture capital firms interested in the space.
High-growth emerging software makers of both programs can also gain fast-tracked onboarding to Google Cloud Marketplace, co-marketing and product acceleration support.
This collaboration is the latest in a series of announcements the two companies have made to help ease the costs and barriers associated with developing generative AI applications for enterprises of all sizes. Startups in particular are constrained by the high costs associated with AI investments.
In February, Google DeepMind unveiled Gemma, a family of state-of-the-art open models. NVIDIA, in collaboration with Google, recently launched optimizations across all NVIDIA AI platforms for Gemma, helping to reduce customer costs and speed up innovative work for domain-specific use cases.
Teams from the companies worked closely together to accelerate the performance of Gemma — built from the same research and technology used to create Google DeepMind’s most capable model yet, Gemini — with NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, when running on NVIDIA GPUs.
NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform, together with Google Kubernetes Engine (GKE) provide a streamlined path for developing AI-powered apps and deploying optimized AI models into production. Built on inference engines including NVIDIA Triton Inference Server and TensorRT-LLM, NIM supports a wide range of leading AI models and delivers seamless, scalable AI inferencing to accelerate generative AI deployment in enterprises.
The Gemma family of models, including Gemma 7B, RecurrentGemma and CodeGemma, are available from the NVIDIA API catalog for users to try from a browser, prototype with the API endpoints and self-host with NIM.
Google Cloud has made it easier to deploy the NVIDIA NeMo framework across its platform via GKE and Google Cloud HPC Toolkit. This enables developers to automate and scale the training and serving of generative AI models, allowing them to rapidly deploy turnkey environments through customizable blueprints that jump-start the development process.
NVIDIA NeMo, part of NVIDIA AI Enterprise, is also available in Google Cloud Marketplace, providing customers another way to easily access NeMo and other frameworks to accelerate AI development.
Further widening the availability of NVIDIA-accelerated generative AI computing, Google Cloud also announced the general availability of A3 Mega will be coming next month. The instances are an expansion to its A3 virtual machine family, powered by NVIDIA H100 Tensor Core GPUs. The new instances will double the GPU-to-GPU network bandwidth from A3 VMs.
Google Cloud’s new Confidential VMs on A3 will also include support for confidential computing to help customers protect the confidentiality and integrity of their sensitive data and secure applications and AI workloads during training and inference — with no code changes while accessing H100 GPU acceleration. These GPU-powered Confidential VMs will be available in Preview this year.
NVIDIA’s newest GPUs based on the NVIDIA Blackwell platform will be coming to Google Cloud early next year in two variations: the NVIDIA HGX B200 and the NVIDIA GB200 NVL72.
The HGX B200 is designed for the most demanding AI, data analytics and high performance computing workloads, while the GB200 NVL72 is designed for next-frontier, massive-scale, trillion-parameter model training and real-time inferencing.
The NVIDIA GB200 NVL72 connects 36 Grace Blackwell Superchips, each with two NVIDIA Blackwell GPUs combined with an NVIDIA Grace CPU over a 900GB/s chip-to-chip interconnect, supporting up to 72 Blackwell GPUs in one NVIDIA NVLink domain and 130TB/s of bandwidth. It overcomes communication bottlenecks and acts as a single GPU, delivering 30x faster real-time LLM inference and 4x faster training compared to the prior generation.
NVIDIA GB200 NVL72 is a multi-node rack-scale system that will be combined with Google Cloud’s fourth generation of advanced liquid-cooling systems.
NVIDIA announced last month that NVIDIA DGX Cloud, an AI platform for enterprise developers that’s optimized for the demands of generative AI, is generally available on A3 VMs powered by H100 GPUs. DGX Cloud with GB200 NVL72 will also be available on Google Cloud in 2025.
]]>From humble beginnings as a university spinoff to an acquisition by the leading global medtech company in its field, Odin Vision has been on an accelerated journey since its founding less than five years ago.
An alum of the NVIDIA Inception program for cutting-edge startups, Odin Vision builds cloud-connected AI software that helps clinicians detect and characterize areas of concern during endoscopy, a procedure where a tiny camera mounted on a tube is inserted into the gastrointestinal tract.
Network-connected devices in the endoscopy room capture and stream real-time video data to the cloud, where powerful NVIDIA GPUs run AI inference. The models’ results are then streamed back to the endoscopy room so that clinicians can see the AI insights overlaid on the live video feed with minimal latency.
The startup was in 2022 acquired by Japanese medtech leader Olympus, which has a 70% global market share in gastrointestinal endoscopic equipment.
“We believe the acquisition brings us much closer to achieving our vision to revolutionize endoscopy through AI and cloud technology,” said Daniel Toth, cofounder and chief technology officer of Odin Vision. “Our software can reach Olympus’ global customer base, enabling us to bring our solutions to as many patients as possible.”
Olympus is also collaborating with NVIDIA on Olympus Office Hours, an advisory program that connects Inception startups with the medical device company’s experts, who will offer deep industry expertise and guidance to help the startups build AI solutions in key areas including gastroenterology, urology and surgery.
Eight leading AI startups have joined the inaugural cohort of the program — which is part of the NVIDIA Inception Alliance for Healthcare, an initiative that brings together medical AI startups with NVIDIA and its healthcare industry partners — to help accelerate their product development and go-to-market goals.
Around a quarter of precancerous polyps are missed during colonoscopies, a kind of endoscopy procedure that examines the lower digestive tract.
While some are missed because the endoscope doesn’t capture video footage of every angle, others remain undetected by clinicians. That’s where AI can help provide a second set of eyes to support clinical decision-making.
Seamless AI integration into the video feeds that medical professionals view during an endoscopy provides an extra data source that can help doctors detect and remove polyps sooner, helping prevent cancer development.
“Polyps develop slowly, and can take five or 10 years to appear as cancer,” Toth said. “If a clinician can detect and remove them in time, it can help save lives.”
CADDIE, the company’s AI software for detecting and classifying polyps, has received the CE Mark of regulatory approval in Europe and is deployed across hospitals in the U.K., Spain, Germany, Poland and Italy — with plans for use in the U.S as well.
Odin Vision also has AI software that has received the CE Mark to assist gastroscopy, where doctors inspect the esophagus for signs of throat cancer.
Odin Vision began as a research project by two professors and a Ph.D. student at University College London who were developing AI techniques for polyp detection. In 2019, they teamed with Toth and Odin’s CEO, Peter Mountney, both from Siemens Healthineers, to commercialize their work.
“NVIDIA GPUs were part of our work from the start — they’ve been essential to train our AI models and were part of our first product prototypes for inference, too,” Toth said. “Since moving to a cloud-based deployment, we’ve begun using the NVIDIA Triton Inference Server for dynamic processing in the cloud.”
The team uses NVIDIA Tensor Core GPUs for accelerated inference — most recently transitioning to NVIDIA L4 GPUs. Adopting NVIDIA Triton Inference Server software and the NVIDIA TensorRT software development kit enabled them to meet the low-latency threshold needed for real-time video-processing AI applications.
In addition to supporting doctors during specific procedures, Odin Vision plans to develop generative AI models that can automate a first draft of the clinical notes doctors prepare afterward — as well as models that can aggregate data across procedures. These would allow endoscopy teams to review analytics and assess how well a procedure is performed compared to clinical guidelines.
“Once you get to a point where there are dozens of AI models tracking different elements of these procedures, we can see if a healthcare professional is inspecting a particular area of the digestive tract for only three minutes, when it’s supposed to take six minutes,” Toth said. “The system can provide a nudge to remind the clinician to follow the guidelines.”
Membership in NVIDIA Inception provided the Odin Vision team access to technical expertise from NVIDIA and cloud credits through leading cloud service providers.
“Cloud credits helped us massively speed up our technology development and deployment, enabling us to release our products to market months earlier than initially planned,” Toth said. “NVIDIA experts also validated our product concept from a technology perspective and provided consultation about GPU and accelerated software optimizations.”
The team found that a cloud-based solution made it easier to push software updates over the air to deployments across hospital customers.
“Some AI companies are sending boxes that need to sit in clinical sites and require regular maintenance, which can prevent normal clinical workflows from running smoothly,” Toth said. “With network-connected devices, we can instead update a single server and the changes reach all end users at the same time.”
Learn more about NVIDIA Inception and subscribe to NVIDIA healthcare news.
]]>Generative AI promises to revolutionize every industry it touches — all that’s been needed is the technology to meet the challenge.
NVIDIA founder and CEO Jensen Huang on Monday introduced that technology — the company’s new Blackwell computing platform — as he outlined the major advances that increased computing power can deliver for everything from software to services, robotics to medical technology and more.
“Accelerated computing has reached the tipping point — general purpose computing has run out of steam,” Huang told more than 12,000 GTC attendees gathered in-person — and many tens of thousands more online — for his keynote address at Silicon Valley’s cavernous SAP Center arena.
“We need another way of doing computing — so that we can continue to scale so that we can continue to drive down the cost of computing, so that we can continue to consume more and more computing while being sustainable. Accelerated computing is a dramatic speedup over general-purpose computing, in every single industry.”
Huang spoke in front of massive images on a 40-foot tall, 8K screen the size of a tennis court to a crowd packed with CEOs and developers, AI enthusiasts and entrepreneurs, who walked together 20 minutes to the arena from the San Jose Convention Center on a dazzling spring day.
Delivering a massive upgrade to the world’s AI infrastructure, Huang introduced the NVIDIA Blackwell platform to unleash real-time generative AI on trillion-parameter large language models.
Huang presented NVIDIA NIM — a reference to NVIDIA inference microservices — a new way of packaging and delivering software that connects developers with hundreds of millions of GPUs to deploy custom AI of all kinds.
And bringing AI into the physical world, Huang introduced Omniverse Cloud APIs to deliver advanced simulation capabilities.
Huang punctuated these major announcements with powerful demos, partnerships with some of the world’s largest enterprises and more than a score of announcements detailing his vision.
GTC — which in 15 years has grown from the confines of a local hotel ballroom to the world’s most important AI conference — is returning to a physical event for the first time in five years.
This year’s has over 900 sessions — including a panel discussion on transformers moderated by Huang with the eight pioneers who first developed the technology, more than 300 exhibits and 20-plus technical workshops.
It’s an event that’s at the intersection of AI and just about everything. In a stunning opening act to the keynote, Refik Anadol, the world’s leading AI artist, showed a massive real-time AI data sculpture with wave-like swirls in greens, blues, yellows and reds, crashing, twisting and unraveling across the screen.
As he kicked off his talk, Huang explained that the rise of multi-modal AI — able to process diverse data types handled by different models — gives AI greater adaptability and power. By increasing their parameters, these models can handle more complex analyses.
But this also means a significant rise in the need for computing power. And as these collaborative, multi-modal systems become more intricate — with as many as a trillion parameters — the demand for advanced computing infrastructure intensifies.
“We need even larger models,” Huang said. “We’re going to train it with multimodality data, not just text on the internet, we’re going to train it on texts and images, graphs and charts, and just as we learned watching TV, there’s going to be a whole bunch of watching video.”
In short, Huang said “we need bigger GPUs.” The Blackwell platform is built to meet this challenge. Huang pulled a Blackwell chip out of his pocket and held it up side-by-side with a Hopper chip, which it dwarfed.
Named for David Harold Blackwell — a University of California, Berkeley mathematician specializing in game theory and statistics, and the first Black scholar inducted into the National Academy of Sciences — the new architecture succeeds the NVIDIA Hopper architecture, launched two years ago.
Blackwell delivers 2.5x its predecessor’s performance in FP8 for training, per chip, and 5x with FP4 for inference. It features a fifth-generation NVLink interconnect that’s twice as fast as Hopper and scales up to 576 GPUs.
And the NVIDIA GB200 Grace Blackwell Superchip connects two Blackwell NVIDIA B200 Tensor Core GPUs to the NVIDIA Grace CPU over a 900GB/s ultra-low-power NVLink chip-to-chip interconnect.
Huang held up a board with the system. “This computer is the first of its kind where this much computing fits into this small of a space,” Huang said. “Since this is memory coherent, they feel like it’s one big happy family working on one application together.”
For the highest AI performance, GB200-powered systems can be connected with the NVIDIA Quantum-X800 InfiniBand and Spectrum-X800 Ethernet platforms, also announced today, which deliver advanced networking at speeds up to 800Gb/s.
“The amount of energy we save, the amount of networking bandwidth we save, the amount of wasted time we save, will be tremendous,” Huang said. “The future is generative … which is why this is a brand new industry. The way we compute is fundamentally different. We created a processor for the generative AI era.”
To scale up Blackwell, NVIDIA built a new chip called NVLink Switch. Each can connect four NVLink interconnects at 1.8 terabytes per second and eliminate traffic by doing in-network reduction.
NVIDIA Switch and GB200 are key components of what Huang described as “one giant GPU,” the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale system that harnesses Blackwell to offer supercharged compute for trillion-parameter models, with 720 petaflops of AI training performance and 1.4 exaflops of AI inference performance in a single rack.
“There are only a couple, maybe three exaflop machines on the planet as we speak,” Huang said of the machine, which packs 600,000 parts and weighs 3,000 pounds. “And so this is an exaflop AI system in one single rack. Well let’s take a look at the back of it.”
Going even bigger, NVIDIA today also announced its next-generation AI supercomputer — the NVIDIA DGX SuperPOD powered by NVIDIA GB200 Grace Blackwell Superchips — for processing trillion-parameter models with constant uptime for superscale generative AI training and inference workloads.
Featuring a new, highly efficient, liquid-cooled rack-scale architecture, the new DGX SuperPOD is built with NVIDIA DG GB200 systems and provides 11.5 exaflops of AI supercomputing at FP4 precision and 240 terabytes of fast memory — scaling to more with additional racks.
“In the future, data centers are going to be thought of … as AI factories,” Huang said. “Their goal in life is to generate revenues, in this case, intelligence.”
The industry has already embraced Blackwell.
The press release announcing Blackwell includes endorsements from Alphabet and Google CEO Sundar Pichai, Amazon CEO Andy Jassy, Dell CEO Michael Dell, Google DeepMind CEO Demis Hassabis, Meta CEO Mark Zuckerberg, Microsoft CEO Satya Nadella, OpenAI CEO Sam Altman, Oracle Chairman Larry Ellison, and Tesla and xAI CEO Elon Musk.
Blackwell is being adopted by every major global cloud services provider, pioneering AI companies, system and server vendors, and regional cloud service providers and telcos all around the world.
“The whole industry is gearing up for Blackwell,” which Huang said would be the most successful launch in the company’s history.
Generative AI changes the way applications are written, Huang said.
Rather than writing software, he explained, companies will assemble AI models, give them missions, give examples of work products, review plans and intermediate results.
These packages — NVIDIA NIMs — are built from NVIDIA’s accelerated computing libraries and generative AI models, Huang explained.
“How do we build software in the future? It is unlikely that you’ll write it from scratch or write a whole bunch of Python code or anything like that,” Huang said. “It is very likely that you assemble a team of AIs.”
The microservices support industry-standard APIs so they are easy to connect, work across NVIDIA’s large CUDA installed base, are re-optimized for new GPUs, and are constantly scanned for security vulnerabilities and exposures.
Huang said customers can use NIM microservices off the shelf, or NVIDIA can help build proprietary AI and copilots, teaching a model specialized skills only a specific company would know to create invaluable new services.
“The enterprise IT industry is sitting on a goldmine,” Huang said. “They have all these amazing tools (and data) that have been created over the years. If they could take that goldmine and turn it into copilots, these copilots can help us do things.”
Major tech players are already putting it to work. Huang detailed how NVIDIA is already helping Cohesity, NetApp, SAP, ServiceNow and Snowflake build copilots and virtual assistants. And industries are stepping in, as well.
In telecom, Huang announced the NVIDIA 6G Research Cloud, a generative AI and Omniverse-powered platform to advance the next communications era. It’s built with NVIDIA’s Sionna neural radio framework, NVIDIA Aerial CUDA-accelerated radio access network and the NVIDIA Aerial Omniverse Digital Twin for 6G.
In semiconductor design and manufacturing, Huang announced that, in collaboration with TSMC and Synopsys, NVIDIA is bringing its breakthrough computational lithography platform, cuLitho, to production. This platform will accelerate the most compute-intensive workload in semiconductor manufacturing by 40-60x.
Huang also announced the NVIDIA Earth Climate Digital Twin. The cloud platform — available now — enables interactive, high-resolution simulation to accelerate climate and weather prediction.
The greatest impact of AI will be in healthcare, Huang said, explaining that NVIDIA is already in imaging systems, in gene sequencing instruments and working with leading surgical robotics companies.
NVIDIA is launching a new type of biology software. NVIDIA today launched more than two dozen new microservices that allow healthcare enterprises worldwide to take advantage of the latest advances in generative AI from anywhere and on any cloud. They offer advanced imaging, natural language and speech recognition, and digital biology generation, prediction and simulation.
The next wave of AI will be AI learning about the physical world, Huang said.
“We need a simulation engine that represents the world digitally for the robot so that the robot has a gym to go learn how to be a robot,” he said. “We call that virtual world Omniverse.”
That’s why NVIDIA today announced that NVIDIA Omniverse Cloud will be available as APIs, extending the reach of the world’s leading platform for creating industrial digital twin applications and workflows across the entire ecosystem of software makers.
The five new Omniverse Cloud application programming interfaces enable developers to easily integrate core Omniverse technologies directly into existing design and automation software applications for digital twins, or their simulation workflows for testing and validating autonomous machines like robots or self-driving vehicles.
To show how this works, Huang shared a demo of a robotic warehouse — using multi-camera perception and tracking — watching over workers and orchestrating robotic forklifts, which are driving autonomously with the full robotic stack running.
Huang also announced that NVIDIA is bringing Omniverse to Apple Vision Pro, with the new Omniverse Cloud APIs letting developers stream interactive industrial digital twins into the VR headsets.
Some of the world’s largest industrial software makers are embracing Omniverse Cloud APIs, including Ansys, Cadence, Dassault Systèmes for its 3DEXCITE brand, Hexagon, Microsoft, Rockwell Automation, Siemens and Trimble.
Everything that moves will be robotic, Huang said. The automotive industry will be a big part of that. NVIDIA computers are already in cars, trucks, delivery bots and robotaxis.
Huang announced that BYD, the world’s largest autonomous vehicle company, has selected NVIDIA’s next-generation computer for its AV, building its next-generation EV fleets on DRIVE Thor.
To help robots better see their environment, Huang also announced the Isaac Perceptor software development kit with state-of-the-art multi-camera visual odometry, 3D reconstruction and occupancy map, and depth perception.
And to help make manipulators, or robotic arms, more adaptable, NVIDIA is announcing Isaac Manipulator — a state-of-the-art robotic arm perception, path planning and kinematic control library.
Finally, Huang announced Project GR00T, a general-purpose foundation model for humanoid robots, designed to further the company’s work driving breakthroughs in robotics and embodied AI.
Supporting that effort, Huang unveiled a new computer, Jetson Thor, for humanoid robots based on the NVIDIA Thor system-on-a-chip and significant upgrades to the NVIDIA Isaac robotics platform.
In his closing minutes, Huang brought on stage a pair of diminutive NVIDIA-powered robots from Disney Research.
“The soul of NVIDIA — the intersection of computer graphics, physics, artificial intelligence,” he said. “It all came to bear at this moment.”
]]>