The AI field is getting more crowded by the day. While most of the coverage we see is around GPUs and large language models, a lot of other things in the AI stack are also critical to performance, such as how to best keep those expensive GPUs filled with data for training.
VAST Data, never a company to sit still, recently launched a new datacenter architecture for what it terms “the AI factory” by fully leveraging the capabilities of NVIDIA’s Bluefield-3 data processing unit (DPU). Additionally, the company announced a partnership to deliver a full-stack, end-to-end AI solution for hyperscalers and service providers looking at large-scale deployments.
In the following sections, I’ll break down VAST’s moves and what these mean for both GPU cloud players and enterprise IT organizations.
VAST and NVIDIA Remove the AI Bottlenecks
One significant challenge for AI performance and scalability is the bottlenecks that traditional architectures experience when they are tasked with AI workloads. GPUs sit idle while waiting for data to be fed for training purposes. This can result in wasted energy and resources, as well as potential challenges around quality of service and security.
These GPUs can be fed faster and at scale thanks to the VAST and NVIDIA partnership. GPU servers from vendors such as Supermicro will ship with a dedicated BlueField DPU powering a container that runs the VAST parallel services operating system. This new architecture combines storage management and data processing, allowing the AI pipeline to be embedded directly into the AI server. Essentially, data is sitting in the DPU for training—meaning there’s no need for a fleet of servers to handle much of the data processing tasks.
Why does this matter? Because it enables cloud service providers to deliver AI services in a zero trust environment, at scale. The customer experience is improved and power consumption is reduced, as is the complexity. And all of this is done at a lower operating cost. For any cloud provider looking to differentiate its AI services, this partnership is something to look at.
Further, VAST claims that the parallelism of its disaggregated share everything (DASE) architecture, when combined with BlueField, removes much of the previously required networking and compute for its functions. Essentially, the architecture’s extreme parallelism allows the DPU to read directly into shared name spaces, removing the need to coordinate I/O across containers.
Integrating storage and data management directly into the DPU removes a lot of performance latency that would otherwise arise from resource contention issues; ultimately, this enables a smaller infrastructure footprint. In fact, VAST estimates that organizations can reduce 70% of their VAST power footprint through this integration. To be clear, this is not total power savings, but the power savings associated with deploying VAST. However, this still translates into a net power savings of about 5% across the entire NVIDIA-VAST environment. As cloud providers (and every datacenter) face a power crunch caused by chips that consume more power, this reduction in rack space and energy consumption translates into more deployed servers and real cost savings.
In sum, the cloud/hyperscaler value prop of the NVIDIA-VAST offering goes like this: Deliver your customers a better AI experience in a zero trust security environment at a lower cost. And while you’re at it, go ahead and make significant progress toward your net-zero carbon goals. It truly seems like a no-brainer.
VAST and Supermicro Delivering at (Hyper)Scale
When looking at the acceleration that VAST and NVIDIA provide, all that goodness needs a delivery vehicle. Infrastructure—the servers and storage that house these GPUs and VAST software—are the most critical elements in this equation. And cloud providers deploying at scale need to work with a server company that has the agility of an ODM, the supply chain of an OEM, and a deep understanding of how the cloud datacenter operates.
Enter Supermicro, which, besides its other merits, is the only company that has recently outperformed NVIDIA on the stock market. Supermicro has an unusual history. When major server vendors ran from the hyperscale and cloud markets, Supermicro leaned in. Not only did the company lean in, but it also further embraced its white-box roots in working to build a portfolio that caters to the needs of the largest datacenters. When one looks at the servers and storage offerings from Supermicro, there is no question about which customers it considers while designing its products.
Because of this, VAST’s partnership with Supermicro makes perfect sense. On the one hand, you have a company striving to establish itself as the AI cloud’s standard data management platform; on the other, you have a company that has already established itself as the infrastructure leader for the cloud and hyperscale space.
Much like what I see in the partnership with NVIDIA, this VAST-Supermicro partnership has a lot of potential. VAST currently has a number of GPU cloud customers, including CoreWeave (already running the NVIDIA-VAST solution), Lambda Labs, Genesis, Core42, and others. Likewise, the list of cloud providers deploying their services on Supermicro hardware is a Who’s Who kind of list.
What These Partnerships Mean for the Market
AI brings with it a complex set of workloads that are difficult for enterprise organizations to leverage fully. AI is hard to leverage because it is hard to deploy and optimize. And with that complexity comes cost. So, while the promise of AI is real, organizations of all sizes are challenged with deploying it at scale.
A lot of the complexity and cost associated with AI training has driven organizations to the cloud. This cloud rush has led to the rise of some of the GPU cloud players I mentioned above. These are cloud providers delivering AI services to thousands of customers simultaneously. If you think deploying AI in the enterprise is difficult, consider how an organization delivers these services at an extreme scale—and the inefficiencies and security challenges that are introduced.
This is what the NVIDIA-VAST partnership resolves. Put customer data closer to the compute by embedding VAST storage and data management inside the DPU. Performance, complexity, security—all of these are accounted for, and then some.
Finally, it only makes sense to work with the server vendor that has the most cloud reach to maximize effectiveness. Supermicro has the depth and breadth of experience to deliver what is shaping up into a converged AI stack for the hyperscale market.
Closing Thoughts
There’s a reason why NVIDIA is valued at over $2 trillion. And there is a reason why Supermicro is the only tech stock that outperformed NVIDIA in the last year or so. Finally, there’s a reason why VAST—this data management “startup”—is valued at more than $9 billion.
It would have been easy for any of these companies to put together some hyped “partnership” before NVIDIA’s customer conference (GTC, which starts today) to garner some coverage. Many companies do so to take advantage of the press cycle. These three didn’t go that route. Instead, this partnership is the result of some big-brain ideas and deep technology integration.
I’m curious to see how this solution lands beyond CoreWeave (currently in testing). I’m also curious to see how VAST’s competitors will respond. The AI market is white-hot, as we all know, and the data management space is a particularly hyped area within it.