The post MI&S Weekly Analyst Insights — Week Ending January 24, 2025 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending January 24, 2025 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending January 17, 2025 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending January 17, 2025 appeared first on Moor Insights & Strategy.
]]>The post RESEARCH PAPER: AI in the Modern Enterprise appeared first on Moor Insights & Strategy.
]]>However, as the need to accelerate and alter modernization efforts to support this new wave increases, IT budgets are only rising incrementally at best. Sustainability is another variable in the equation. While AI initiatives require more compute, storage, and other resources, CIOs are tasked with lowering power footprints to drive sustainability goals.
How can enterprise IT organizations simultaneously achieve modernization, AI, and sustainability goals, which seem to directly contradict one another? Moor Insights & Strategy (MI&S) sees the solution as rooted in infrastructure.
Outdated operating stacks powered by outdated hardware and processors unable to deliver the required performance, agility, security, and targeted acceleration are, in some cases, used as the building blocks for the AI-driven workloads running the modern business. This is a recipe for failure.
This research brief will explore enterprise IT organizations’ technical and operational challenges and how technology vendors are responding with hybrid cloud environments powered by modern AI-ready infrastructure. Further, it will evaluate how Nutanix, Dell, and Intel have partnered to deliver the Dell XC Plus running the Nutanix Cloud Platform (NCP) and GPT-in-a-Box powered by AI-accelerated Intel Xeon CPUs.
Click the logo below to download the report:
Table of Contents
Companies Cited:
The post RESEARCH PAPER: AI in the Modern Enterprise appeared first on Moor Insights & Strategy.
]]>The post Datacenter Podcast: Episode 35 – Talking Extreme Networks, OpenAI, Oracle, Microsoft, IonQ, Dell appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
3:38 Can Extreme Networks Vie for Share in 2025?
12:51 Do We Really Know How To Do It?
19:21 Oracle Exadata X11M – The Real Data Platform
28:51 Microsoft Betting Big on AI Data Centers in 2025
36:50 Entangled Ambitions
42:23 Dell Embraces OCP
50:16 Getting To Know The Team
Can Extreme Networks Vie for Share in 2025?
https://www.extremenetworks.com/resources/blogs/introducing-extreme-platform-one
Do We Really Know How To Do It?
https://blog.samaltman.com/reflections
Oracle Exadata X11M – The Real Data Platform
https://www.oracle.com/news/announcement/oracle-introduces-exadata-x11m-platform-2025-01-07/%C3%82%C2%A0
Microsoft Betting Big on AI Data Centers in 2025
https://blogs.microsoft.com/on-the-issues/2025/01/03/the-golden-opportunity-for-american-ai/
Entangled Ambitions
https://investors.ionq.com/news/news-details/2025/IonQ-Completes-Acquisition-of-Qubitekk-Solidifying-Leadership-in-Quantum-Networking/default.aspx
Dell Embraces OCP
https://moorinsightsstrategy.com/research-papers/evaluation-of-open-compute-modular-hardware-specification/%C3%82
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Datacenter Podcast: Episode 35 – Talking Extreme Networks, OpenAI, Oracle, Microsoft, IonQ, Dell appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending January 10, 2025 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending January 10, 2025 appeared first on Moor Insights & Strategy.
]]>The post RESEARCH PAPER: Sustainable Performance in the Datacenter appeared first on Moor Insights & Strategy.
]]>While many datacenter professionals look to GPUs and CPUs as the key contributors, they often overlook the role of storage in this power consumption equation. This Moor Insights & Strategy (MI&S) pulse brief will explore this power challenge and how Solidigm’s new D5-P5336 SSD with a capacity of 122.88 TB helps datacenter operators solve for both performance and power consumption.
Click the logo below to download the research paper and read more.
Table of Contents
Companies Cited:
The post RESEARCH PAPER: Sustainable Performance in the Datacenter appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending January 3, 2025 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending January 3, 2025 appeared first on Moor Insights & Strategy.
]]>The post Platform Security Startup Axiado Secures Series C Funding appeared first on Moor Insights & Strategy.
]]>Axiado, a startup in the cybersecurity space, has announced that it has closed its Series C funding round to enable the company to expand strategic partnerships and scale its operations. According to Axiado, the funding, led by Maverick Silicon, signals a recognition of the market need for stronger resilience across the platforms that power AI—particularly in the datacenter. I want to put this announcement in context by digging deeper into platform security and what has made Axiado an interesting play for investors.
By now, cybersecurity is an obvious and critical priority for every IT organization. As a general rule, complexity scales with size—meaning the bigger the datacenter, the more challenging it is to secure. Many people still think of security through the lens of firewalls and the tools that protect against intrusion into an operating environment—the walls put up to keep bad actors out. However, many overlook the platform-level protections that ensure infrastructure is secured from the moment the proverbial power button is pushed.
Platform security delivers this protection through a combination of trusted platform modules, root of trust, baseboard management controllers and other tools that ensure environments are booting and operating in a secure environment. This is important because some of the most malicious tools can burrow themselves down below the operating environment and slowly siphon data over the course of months before being detected. Known as rootkit attacks, these threats are extremely difficult to detect and counteract.
Standard servers populating the enterprise datacenter from the likes of Dell, HPE and Lenovo have platform security tools (both silicon and software) built onto the motherboard; these are managed through their respective consoles (e.g., OpenManage from Dell).
However, when looking at the hyperscale market, the game changes a little bit. In this setting, different servers from a variety of vendors are deployed across hundreds of datacenters. Further, those servers tend to be from lesser-known vendors. To address this reality, the Open Compute Project released a standard for developing a secure control module so that platform security functionality can be offloaded to a dedicated card. This means that a datacenter can deploy different servers from various vendors and still have a single platform security tool—with no vendor-specific security chips or other tools that must be configured and managed separately.
This is where Axiado comes into play.
Axiado addresses these challenges with its trusted control/compute unit. This card takes all of the disparate platform security functions and pieces of silicon that may reside on a motherboard and puts them on a dedicated system-on-a-chip. It utilizes AI to further scan for threats and abnormal behavior across the system.
As the diagram at the head of this article shows, the Secure Vault is where platform integrity begins. It is in this vault that validated and signed firmware, immutable code and other security components come together with RoT and TPM to ensure a secure and pristine boot environment. The Secure Vault is also designed to provide critical capabilities for re-establishing a known and trusted state when it’s necessary to recover after being compromised.
While the Secure Vault aims to enable a trusted environment, Secure AI is an inference processor that is tasked with detecting and blocking attacks by looking for suspicious behavior, such as the very low-level rootkit attacks previously mentioned. The combination of these capabilities delivers a system security posture that is both broad and deep. And it does this in a way that is simple for datacenter operators because it works across server platforms, from ODMs to OEMs.
In addition to these security capabilities, Axiado has now embedded its Dynamic Thermal Management technology into the TCM. As AI and other compute-intensive workloads populate the hyperscale datacenter more and more, power and cooling have become significant challenges. DTM looks at application and system performance in real time and adjusts cooling requirements. This could mean substantial cost savings if it shaves off even a few percentage points of power use across a hyperscale datacenter.
The typical commercial enterprise will not likely deploy Axiado TCUs anytime soon. That customer segment tends to acquire servers from OEMs such as Dell, HPE and Lenovo. As touched on earlier, those servers are equipped with vendor-specific silicon and tools for secure boot and management.
The ideal customer for Axiado is a hyperscaler that’s operating datacenters deploying heterogeneous servers across a global environment. These environments scale in the hundreds of thousands of servers, powered by a variety of CPUs and GPUs. In fact, the more diverse the environment, the stronger the value prop of Axiado becomes as the potential TCO savings grow more significant.
This is a tricky question. One could rightly argue that hyperscalers already have tools that achieve what Axiado does through its TCU. It’s up for debate which is more comprehensive, but security rooted in hardware is a critical building block for any hyperscaler that wants to stay in business. However, Axiado does quite well in integrating all these security elements into a single SoC. This integration brings performance, power and cost benefits that make It an interesting play.
Are other companies delivering hardware solutions that compete with Axiado? Certainly. The security market continues to grow at a torrid pace, and new entrants jump in seemingly daily. However, Axiado’s focus on solving these challenges around security, resilience and power at scale makes it unique. Maybe that’s why this funding round was oversubscribed.
Axiado is in an interesting place. It has created a solution sorely needed in the hyperscale market. However, hyperscale is a tough market to sell into. It’s very engineering-driven and cost-sensitive. I suspect that Axiado will look to use some of this $60 million to ramp up its sales and marketing organization to bring what is a very compelling solution to the hyperscale market in a full-throated way.
What Axiado brings to bear with its TCU is complementary to the major chip companies such as AMD, Nvidia and Intel. Even though there is some overlap in terms of capabilities, there seems to be a lot of potential for strategic partnerships. Either way, the company’s technology is certain to find a home in the cloud.
The post Platform Security Startup Axiado Secures Series C Funding appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending December 13, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending December 13, 2024 appeared first on Moor Insights & Strategy.
]]>The post Datacenter Podcast: Episode 34 – Talking HPE, Google, AWS, VMware, Microsoft, Storage appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
4:15 HPE Q4FY24 Earnings
10:49 Willow: A Window On The Multiverse?
18:05 Cloud – The New Silicon Giants
26:19 Observability As A VMware Ripcord?
31:58 It Sees What I See
35:42 Storage Is Cool Again
40:21 Getting To Know Us – Ghost of Christmas Past
HPE Q4FY24 Earnings
https://x.com/WillTownTech/status/1864787679906291962
Willow: A Window On The Multiverse?
https://blog.google/technology/research/google-willow-quantum-chip/
Cloud – The New Silicon Giants
https://moorinsightsstrategy.com/research-notes/some-thoughts-on-aws-reinvent-aws-silicon-and-q-developer/%C3%82
Observability As A VMware Ripcord?
https://www.forbes.com/sites/moorinsights/2024/12/05/the-power-of-deep-observability-in-facilitating-vmware-migrations/
It Sees What I See
https://www.microsoft.com/en-us/microsoft-copilot/blog/2024/12/05/copilot-vision-now-in-preview-a-new-way-to-browse/
Storage Is Cool Again
https://moorinsightsstrategy.com/research-notes/modernizing-your-datacenter-start-with-storage/
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Datacenter Podcast: Episode 34 – Talking HPE, Google, AWS, VMware, Microsoft, Storage appeared first on Moor Insights & Strategy.
]]>The post RESEARCH PAPER: An Evaluation of the Open Compute Modular Hardware Specification appeared first on Moor Insights & Strategy.
]]>Of the three MHS projects, the datacenter MHS (DC-MHS) is particularly interesting because it significantly impacts major server vendors servicing both the hyperscale and enterprise server market segments. This project focuses on delivering a modular server hardware specification that enables hardware vendors to more easily and quickly source and manufacture the server infrastructure that powers the datacenter.
The progression of DC-MHS is noteworthy, given the accelerated pace of semiconductor and hardware innovation in response to the AI explosion. This Moor Insights & Strategy (MI&S) research brief will explore three key areas:
Click the logo below to download the research paper to read more.
Table of Contents
Companies Cited:
The post RESEARCH PAPER: An Evaluation of the Open Compute Modular Hardware Specification appeared first on Moor Insights & Strategy.
]]>The post RESEARCH NOTE: Some Thoughts On AWS re:Invent — AWS Silicon and Q Developer appeared first on Moor Insights & Strategy.
]]>Amazon Web Services held its annual re:Invent customer event last week in Las Vegas. With over 200 analysts in attendance, the event focused on precisely what one would expect: AI, and how the largest cloud service provider on the planet is building infrastructure, models, and tools to enable AI in the enterprise.
While my Moor Insights & Strategy colleagues Robert Kramer and Jason Andersen have their own thoughts to share about data and AI tools (for example here), this research note will explore a few areas that I found interesting, especially regarding AWS chips and the Q Developer tool.
AWS designs and builds a lot of its own silicon. Its journey began with the Nitro System, which handles networking, security, and a bit of virtualization offload within an AWS-specific virtualization framework. Effectively, Nitro offloads a lot of the low-level work that connects and secures AWS servers.
From there, the company moved into the CPU space with Graviton in 2018. Since its announcement, this chip has matured to its fourth generation and now supports about half the workloads running in AWS.
AWS announced Inferentia and Trainium in 2019 and 2020, respectively. The functionality of each AI accelerator is easy to deduce from its name. While both pieces of silicon have been available for some time now, we haven’t heard as much about them—especially in comparison to the higher-profile Graviton. Despite not being as well-known as Graviton, Inferentia and Trainium have delivered tangible value since their respective launches. The first generation of Inferentia focused on deep learning inference, boasting 2.3x higher throughput and 70% lower cost per inference compared to the other inference-optimized instances on EC2 at the time.
Inferentia2 focused on generative AI (Inf2 instances in EC2) with a finer focus on distributed inference. Architectural changes to silicon, combined with features such as sharding (splitting models and distributing the work), allowed the deployment of large models across multiple accelerators. As expected, performance numbers were markedly higher—including 4x the throughput and up to 10x lower latency relative to Inferentia1.
Based on what we’ve seen through the Graviton and Inferentia evolutions, the bar has been raised for Trainium2, which AWS just released into general availability. The initial results look promising.
As it has done with other silicon, the Annapurna Labs team at AWS has delivered considerable gains in Trainium2. While architectural details are scant (which is normal for how AWS talks about its silicon), we do know that the chip is designed for big, cutting-edge generative AI models—both training and inference.
Further, AWS claims a price-performance advantage for Trainium2 instances (Trn2) of 30% to 40% over the GPU-based EC2 P5e and P5en instances (powered by NVIDIA H200s). It is worth noting that AWS also announced new P6 instances based on NVIDIA’s hot new Blackwell GPU. A point of clarification is worth a mention here. Unlike Blackwell, Trainium2 is not a GPU. It is a chip designed for training and inference only. It is important to note this because such chips, though narrow in functionality, can deliver significant power savings relative to GPUs.
The Trainium2 accelerator delivers 20.8 petaflops of compute. For enterprise customers looking to train and deploy a large language model with billions of parameters, these Trn2 instances are ideal, according to AWS. (A Trn2 instance bundles 16 Trainium2 chips with 1.5TB of high-bandwidth memory, 192 vCPUs, and 2TB RAM.)
Going up the performance ladder, AWS also announced Trainium2 UltraServers—effectively 64 Trainium2 chips across four instances to deliver up to 83.2 petaflops of FP8 precision compute. These chips, along with 6TB of HBM and 185 TBps of memory bandwidth, position the UltraServers to support larger foundational models.
To connect these chips, AWS developed NeuronLink—a high-speed, low-latency chip-to-chip interconnect. For a parallel to this back-end network, think of NVIDIA’s NVLink. Interestingly, AWS is part of the UALink Consortium, so I’m curious as to whether NeuronLink is tracking to the yet-to-be-finalized UALink 1.0 specification.
Finally, AWS is partnering with Anthropic to build an UltraCluster named Project Rainier, which will scale to hundreds of thousands of chips to train Anthropic’s current generation of models.
What does all of this mean? Is AWS suddenly taking on NVIDIA (and other GPU players) directly? Is this some big move where AWS will push—or even nudge—its customers toward Trn2 instances instead of P5/P6 instances? I don’t think so. I believe AWS is following the Graviton playbook, which is simple: put out great silicon that can deliver value and let customers choose what works best for them. For many, having the choice will mean they continue to consume NVIDIA because they have built their entire stacks around not just Hopper or Blackwell chips, but also NVIDIA software. For some, using Trn2 instances along with Neuron (the AWS SDK for AI) will be the optimal choice. Either way, the customer benefits.
Over time, I believe we will see Trainium’s adoption trend align with that of Graviton. Yes, more and more customers will select this accelerator as the foundation of their generative AI projects. But so too will many for Blackwell and the NVIDIA chips that follow. As this market continues to grow at a torrid pace, everybody wins.
It’s worth mentioning that AWS also announced Trainium3, which will be available in 2025. As one would expect, this chip will yet again be a significant leap forward in terms of performance and power efficiency. The message being sent is quite simple—AWS is going to deliver value on the enterprise AI journey, and the company is taking a long-term approach to driving that value.
One of the other areas that I found very interesting was the use of AI agents for modernizing IT environments. Q is an AWS generative AI assistant that one could consider similar to the more well-known Microsoft Copilot. Naturally, Q Developer is a tool for creating and managing Q assistants.
While Q Developer is interesting for several reasons, digital transformation is the area of assistance I found most compelling. At re:Invent, AWS rolled out Q Developer to modernize three environments: Microsoft .NET, mainframe applications, and VMware environments. In particular, the VMware transformation is of great interest to me as there has been so much noise about VMware in the market since its acquisition by Broadcom. With Q Developer, AWS has built agents to migrate VMware virtual machines to EC2 instances, removing dependencies. The process starts by collecting (on-premises) server and network data and dropping it into Q Developer. Q Developer then outputs suggestions for migration waves that an IT staff can accept or modify as necessary. This is followed by Q Developer building out and continuously testing an AWS network. And then the customer selects the waves to start migrating.
While Q Developer is not going to be perfect or remove 100% of the work for decoupling from VMware, it will help get through some of the most complex elements of migrating. This can save months of time and significant dollars for an enterprise IT organization. This is what makes Q Developer so provocative and disruptive. I could easily see the good folks at Azure and Google Cloud looking to build similar agents for the same purpose.
Q Developer for .NET and mainframe are also highly interesting, albeit far less provocative. Of the two, I think the mainframe modernization effort is quite compelling as it deconstructs monolithic mainframe applications and refactors them into Java. I like what AWS has done with this and can see the value—especially given the COBOL skills gap that exists in many organizations. Or, more importantly, the skills gap that exists in understanding and attempting to run both mainframe and cloud environments in parallel.
With all this said, don’t expect Q Developer to spell the end of the mainframe. Mainframes are still employed for a reason—70 years after the first commercial mainframes, and 60 years after IBM’s System/360 revolutionized the computing market. It is not because they are hard to migrate away from. The reason is tied to security and performance, especially in transaction processing at the largest scales. However, Q Developer for mainframe modernization is pretty cool and can certainly help, especially as a code assistant for a workforce that is trying to understand and maintain COBOL code written decades ago.
I’ve been attending tech conferences for a long time. My first was Macworld in 1994. Since then, I’ve attended every major conference at least a few times. AWS re:Invent 2024 was by far the largest and busiest conference I’ve attended.
While there was so much news to absorb, I found these announcements around Trainium and Q Developer to be the most interesting. Again, my Moor Insights & Strategy colleagues Robert Kramer and Jason Andersen, along with our founder and CEO Patrick Moorhead, will have their own perspectives. Be sure to check them out.
The post RESEARCH NOTE: Some Thoughts On AWS re:Invent — AWS Silicon and Q Developer appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending December 6, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending December 6, 2024 appeared first on Moor Insights & Strategy.
]]>The post RESEARCH NOTE: Modernizing Your Datacenter? Take a Look at Your Storage appeared first on Moor Insights & Strategy.
]]>When discussing modernizing the datacenter, storage is one of the foundational elements that, while critical to success, is often overlooked. Legacy storage infrastructure can and will impact the performance of data-driven environments. In fact, I’ll go so far as to say that storage must be the first consideration of any modernization effort.
This research looks at the role of block storage in the cloud environment and how companies like Lightbits Labs deliver performance, scale, and cost savings realized by some of the largest organizations.
The acquisition of VMware by Broadcom nearly a year ago kicked off a discussion around modernization. This highly disruptive act has caused many organizations to reconsider the future state of their datacenter, with or without VMware.
There are many different estimates for how many enterprise IT organizations are having these internal conversations. Based on the estimates I’ve seen, it’s safe to say the vast majority are at least considering significant datacenter modernization projects. In fact, I can say that every IT leader I’ve spoken with is considering what their move-forward plan in this area looks like. By this point, it’s less about VMware specifically and more about the broader need for modernization and cloud-native environments. It’s undoubtedly a healthy and necessary debate for internal IT organizations, as a sense of complacency and incrementalism seems to have crept in over the last 10 years or so.
There are two key questions facing the enterprise: What does our modernization plan look like? And what technologies should we deploy to meet the needs of today and tomorrow across the organization?
For those organizations that have decided to embark on the modernization journey, the first decision is whether to build a cloud or deploy a cloud. In other words, is it best to build a cloud from the ground up using OpenStack, or deploy a cloud environment on Nutanix, Red Hat OpenShift, or some other solution stack? In either case, virtualized and containerized environments are only as performant and responsive as the supporting infrastructure. In turn, Infrastructure is only as performant as its storage environment.
Unfortunately, storage is often treated as a secondary consideration, and many organizations fail to realize the full potential of their modernization efforts because of slower spinning disks and the lack of a storage OS designed for performance and scale.
While it is fairly obvious that storage performance can (and will) impact application performance, it’s important to consider whether the application in question is an e-commerce site performing tens of thousands of transactions per minute or an AI cloud delivering services in real time to its customer base. Scale matters, and performant scale is even more important.
Lightbits Labs is a software-defined storage (SDS) provider that powers some of the largest and most demanding environments—everything from e-commerce to cloud service providers. It achieves this through NVMe/TCP, a technology the company invented and has received several patents for. In this environment, the NVMe protocol is routed over Ethernet using the TCP/IP protocol suite. This allows high-performance clusters without the need for specialized networking and hardware.
Alternative approaches have their limitations. Direct-attached storage (DAS) and storage area networks (SAN) are popular models; however, each comes with a set of challenges. In the case of DAS, it’s an inflexibility that can lead to inefficiencies as applications become wedded to servers and storage. In the case of SANs, it’s a matter of cost, as proprietary hardware and specialized networking come at a premium.
These challenges are avoided with SDS in general and Lightbits in particular. In comparison to DAS, Lightbits can deliver higher utilization for lower TCO and better utilization of flash for longer endurance of QLC. Compared to SANs, Lightbits NVMe/TCP delivers high performance without the proprietary hardware stack.
Performance is a big deal for Lightbits Labs. In fact, its claim of scaling up to 75 million IOPS (input/output operations per second) at sub-1ms latency puts it in a performance leadership position. This SDS solution outperforms CEPH, the more broadly deployed open-source block solution for data-intensive cloud environments. Like CEPH, Lightbits can be seamlessly integrated into OpenStack and managed through Cinder, Nova, and Glance through the Cinder API.
Even for legacy virtualization, there is a Lightbits play. The company’s certified solution supports VMware and KVM environments as the back-end SDS. In fact, Lightbits can even run alongside vSAN and be used as a vMotion target in vSphere. Whether this support will continue as the changes in Broadcom’s portfolio impact existing implementations is not yet known.
The enterprise IT infrastructure market is constantly changing. However, the confluence of several factors is causing organizations to consider how best to support the business’s needs today and in the future. The cloud operating model is still how organizations can achieve the agility required to meet the needs of the data-driven workloads populating the datacenter. How that cloud operating model is constructed—what the underlying compute, networking, and storage environments are comprised of—matters.
Storage, in particular, is the building block upon which everything depends—performance, resilience, and both of these at scale. Fast, resilient storage can help deliver results faster, be it for AI inferencing or tens of thousands of financial transactions per second.
Companies like Lightbits Labs are, in many ways, the innovation engines that drive change in the industry. While they may not have the brand awareness of some of the bigger players in the market, they nonetheless power some of the largest and most performant clouds and enterprise datacenters in the market. In other words, the most performance- and scale-sensitive organizations deploy Lightbits because of its performance, scale, and cost. Which means it’s probably worth taking a look at for any organization with these high-end needs.
The post RESEARCH NOTE: Modernizing Your Datacenter? Take a Look at Your Storage appeared first on Moor Insights & Strategy.
]]>The post Datacenter Podcast: Episode 33 – Talking Microsoft, Supercomputing 2024, Atom Computing, Microsoft Ignite 2024 appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
2:41 Little AI Guys
7:10 Supercomputing Goes Mainstream
13:20 2 Dozen Logical Qubits
22:43 Chips & Chips & Chips At Microsoft Ignite
32:18 Getting To Know Us – The Thanksgiving Edition
Little AI Guys
https://www.microsoft.com/en-us/research/articles/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/
Supercomputing Goes Mainstream
https://www.linkedin.com/feed/update/urn:li:activity:7264651754347085825/
https://www.linkedin.com/feed/update/urn:li:activity:7265018456003997696/
2 Dozen Logical Qubits
https://atom-computing.com/high-fidelity-gates-and-the-worlds-largest-entangled-logical-qubit-state/
https://www.forbes.com/sites/moorinsights/2024/01/25/microsoft-uses-ai-and-hpc-to-analyze-32-million-new-materials/lion-new-materials/
Chips & Chips & Chips At Microsoft Ignite
https://www.datacenterknowledge.com/cloud/microsoft-ignite-2024-new-azure-data-center-chips-unveiled
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Datacenter Podcast: Episode 33 – Talking Microsoft, Supercomputing 2024, Atom Computing, Microsoft Ignite 2024 appeared first on Moor Insights & Strategy.
]]>The post Digging Into The Ultra Accelerator Link Consortium appeared first on Moor Insights & Strategy.
]]>The Ultra Accelerator Link Consortium has recently incorporated, giving companies the opportunity to join, and it has announced that the UALink 1.0 specification will be available for public consumption in Q1 2025. Included in the Consortium are its “Promoter” members, including AMD, Astera Labs, AWS, Cisco, Google, HPE, Intel, Meta and Microsoft.
The UALink Consortium aims to deliver specifications and standards that allow industry players to develop high-speed interconnects for AI accelerators at scale. In other words, it addresses the GPU clusters that train the largest of large language models and solve the most complex challenges. Much like Nvidia developed its proprietary NVLink to address GPU-to-GPU connectivity, UALink looks to broaden this capability across the industry.
The key to the UALink Consortium is the partnership among the biggest technology companies—many of whom compete with one another—to better enable the future of AI and other accelerator-dependent workloads. Let’s explore this initiative and what it could mean for the market.
High-performance computing was perhaps the first workload classification that highlighted that CPUs were not always the best processor for the job. The massive parallelism and high data throughput of GPUs enable tasks like deep learning, genomic sequencing and big data analytics to perform far better than they would on a CPU. These architectural differences and programmability have made GPUs the accelerator of choice for AI. In particular, the training of LLMs that double in size every six months or so happens far more efficiently and much faster on GPUs.
However, in a server architecture, the CPU (emphasis on the “C”—central) is the brain of the server, with all functions routing through it. If a GPU is to be used for a function, it connects to a CPU over PCIe. Regardless of how fast that GPU can perform a function, system performance is limited by how quickly a CPU can route traffic to and from it. This limitation becomes glaringly noticeable as LLMs and datasets become ever larger, requiring a large number of GPUs to train them in concert in the case of generative AI. This is especially true for hyperscalers and other large organizations training AI frontier models. Consider a training cluster with thousands of GPUs spread across several racks, all dedicated to training GPT-4, Mistral or Gemini 1.5. The amount of latency introduced into the training period is considerable.
This is not just a training issue, however. As enterprise IT organizations begin to operationalize generative AI, performing inference at scale is also challenging. In the case of AI and other demanding workloads such as HPC, the CPU can significantly limit system and cluster performance. This can have many implications in terms of performance, cost and accuracy.
The UALink Consortium was formed to develop a set of standards that enables accelerators to communicate with one another (bypassing the CPU) in a fast, low-latency way—and at scale. The specification defines an I/O architecture that enables speeds of up to 200 Gbps (per lane), scaling up to 1,024 AI accelerators. This specification delivers considerably better performance than that of Ethernet and connects considerably more GPUs than Nvidia’s NVLink.
To better contextualize UALink and its value, think about connectivity in three ways: front-end network, scale-up network and scale-out network. Generally, the front-end network is focused on connecting the hosts to the broader datacenter network for connectivity to compute and storage clusters as well as the outside world. This network is connected through Ethernet NICs on the CPU. The back-end network is focused on GPU-to-GPU connectivity. This back-end network is composed of two components: the scale-up fabric and the scale-out fabric. Scale-up connects hundreds of GPUs at the lowest latency and highest bandwidth (which is where UALink comes in). Scale-out is for scaling AI clusters beyond 1,024 GPUs—to 10,000 or 100,000. This is enabled using scale-out NICs and Ethernet and is where Ultra Ethernet will play.
When thinking about a product like the Dell PowerEdge XE9680, which can support up to eight AMD Instinct or Nvidia HGX GPUs, a UALink-enabled cluster would support well over 100 of these servers in a pod where GPUs would have direct, low-latency access to one another.
As an organization’s needs grow, Ultra Ethernet Consortium-based connectivity can be used for scale-out. In 2023, industry leaders including Broadcom, AMD, Intel and Arista formed the UEC to drive performance, scale and interoperability for bandwidth-hungry AI and HPC workloads. In fact, AMD just launched the first UEC-compliant NIC, the Pensando Pollara 400, a few weeks ago. (Our Moor Insights & Strategy colleague Will Townsend has written about it in detail.)
Getting back to UALink, it is important to understand that this is not simply some pseudo-standard being used to challenge the dominance of Nvidia and NVLink. This is a real working group developing a genuine standard with actual solutions being designed.
In parallel, we see some of the groundwork being laid by UALink Promotor companies like Astera Labs, which recently introduced its Scorpio P-Series and X-Series fabric switches. While the P-Series switch enables GPU-to-CPU connectivity over PCIe Gen 6 (which can be customized), the X-Series is a switch aimed at GPU-to-GPU connectivity. Given that the company has already built the underlying fabric, one can see how it could support UALink sometime soon after the specification is published.
It is important to understand that UALink is agnostic about accelerators and the fabrics, switches, retimers and other technology that enable accelerator-to-accelerator connectivity. It doesn’t favor AMD over Nvidia, nor does it favor Astera Labs over, say, Broadcom (if that company chooses to contribute). It’s about building an open set of standards that favors innovation across the ecosystem.
While the average enterprise IT administrator, or even CIO, won’t care much about UALink, they will care about what it will deliver to their organization: faster training and inference on platforms that consume less power and can be somewhat self-managed and tuned. Putting a finer point on it—faster results at lower cost.
It’s easy to regard what UALink is doing as an attempt to respond to Nvidia’s stronghold. And at some level, it certainly is. However, in the bigger picture this is less about copying what Nvidia does and more about ensuring that critical capabilities like GPU-to-GPU connectivity don’t fall under the purview of one company with a vested interest in optimizing for its own GPUs.
It will be interesting to watch how server vendors such as Dell, HPE, Lenovo and others choose to support both UALink and NVLink. (Lenovo is a “Contributor” member of the UALink Consortium, but Dell has not joined as yet.) NVLink uses a proprietary signaling interconnect to support Nvidia GPUs. Alternatively, UALink will support accelerators from a range of vendors, with switching and fabric from any vendor that adheres to the UALink standard.
There is a real and significant cost to these server vendors—from design to manufacturing and through the qualification and sales/support process. On the surface, it’s easy to see where UALink would appeal to, say, Dell or HPE. However, there is a market demand for Nvidia that cannot and will not be ignored. Regardless of one’s perspective on the ability of “the market” to erode Nvidia’s dominance, we can all agree that its dominance will not fade fast.
The UALink Consortium (and forthcoming specification) is a significant milestone for the industry as the challenges surrounding training AI models and operationalizing data become increasingly complex, time-consuming and costly.
If and when we see companies like Astera Labs and others develop the underlying fabric and switching silicon to drive accelerator-to-accelerator connectivity, and when companies like Dell and HPE build platforms that light all of this up, the downmarket impact will be significant. This means the benefits realized by hyperscalers like AWS and Meta will also benefit enterprise IT organizations that look to operationalize AI across business functions.
Ideally, we would have a market with one standard interconnect specification for all accelerators—all GPUs. And maybe at some point that day will come. But for now, it’s good to see rivals like AMD and Intel or Google and AWS coalesce around a standard that is beneficial to all.
The post Digging Into The Ultra Accelerator Link Consortium appeared first on Moor Insights & Strategy.
]]>The post Digging Into The Ultra Accelerator Link Consortium appeared first on Moor Insights & Strategy.
]]>The Ultra Accelerator Link Consortium has recently incorporated, giving companies the opportunity to join, and it has announced that the UALink 1.0 specification will be available for public consumption in Q1 2025. Included in the Consortium are its “Promoter” members, including AMD, Astera Labs, AWS, Cisco, Google, HPE, Intel, Meta and Microsoft.
The UALink Consortium aims to deliver specifications and standards that allow industry players to develop high-speed interconnects for AI accelerators at scale. In other words, it addresses the GPU clusters that train the largest of large language models and solve the most complex challenges. Much like Nvidia developed its proprietary NVLink to address GPU-to-GPU connectivity, UALink looks to broaden this capability across the industry.
The key to the UALink Consortium is the partnership among the biggest technology companies—many of whom compete with one another—to better enable the future of AI and other accelerator-dependent workloads. Let’s explore this initiative and what it could mean for the market.
High-performance computing was perhaps the first workload classification that highlighted that CPUs were not always the best processor for the job. The massive parallelism and high data throughput of GPUs enable tasks like deep learning, genomic sequencing and big data analytics to perform far better than they would on a CPU. These architectural differences and programmability have made GPUs the accelerator of choice for AI. In particular, the training of LLMs that double in size every six months or so happens far more efficiently and much faster on GPUs.
However, in a server architecture, the CPU (emphasis on the “C”—central) is the brain of the server, with all functions routing through it. If a GPU is to be used for a function, it connects to a CPU over PCIe. Regardless of how fast that GPU can perform a function, system performance is limited by how quickly a CPU can route traffic to and from it. This limitation becomes glaringly noticeable as LLMs and datasets become ever larger, requiring a large number of GPUs to train them in concert in the case of generative AI. This is especially true for hyperscalers and other large organizations training AI frontier models. Consider a training cluster with thousands of GPUs spread across several racks, all dedicated to training GPT-4, Mistral or Gemini 1.5. The amount of latency introduced into the training period is considerable.
This is not just a training issue, however. As enterprise IT organizations begin to operationalize generative AI, performing inference at scale is also challenging. In the case of AI and other demanding workloads such as HPC, the CPU can significantly limit system and cluster performance. This can have many implications in terms of performance, cost and accuracy.
The UALink Consortium was formed to develop a set of standards that enables accelerators to communicate with one another (bypassing the CPU) in a fast, low-latency way—and at scale. The specification defines an I/O architecture that enables speeds of up to 200 Gbps (per lane), scaling up to 1,024 AI accelerators. This specification delivers considerably better performance than that of Ethernet and connects considerably more GPUs than Nvidia’s NVLink.
To better contextualize UALink and its value, think about connectivity in three ways: front-end network, scale-up network and scale-out network. Generally, the front-end network is focused on connecting the hosts to the broader datacenter network for connectivity to compute and storage clusters as well as the outside world. This network is connected through Ethernet NICs on the CPU. The back-end network is focused on GPU-to-GPU connectivity. This back-end network is composed of two components: the scale-up fabric and the scale-out fabric. Scale-up connects hundreds of GPUs at the lowest latency and highest bandwidth (which is where UALink comes in). Scale-out is for scaling AI clusters beyond 1,024 GPUs—to 10,000 or 100,000. This is enabled using scale-out NICs and Ethernet and is where Ultra Ethernet will play.
When thinking about a product like the Dell PowerEdge XE9680, which can support up to eight AMD Instinct or Nvidia HGX GPUs, a UALink-enabled cluster would support well over 100 of these servers in a pod where GPUs would have direct, low-latency access to one another.
As an organization’s needs grow, Ultra Ethernet Consortium-based connectivity can be used for scale-out. In 2023, industry leaders including Broadcom, AMD, Intel and Arista formed the UEC to drive performance, scale and interoperability for bandwidth-hungry AI and HPC workloads. In fact, AMD just launched the first UEC-compliant NIC, the Pensando Pollara 400, a few weeks ago. (Our Moor Insights & Strategy colleague Will Townsend has written about it in detail.)
Getting back to UALink, it is important to understand that this is not simply some pseudo-standard being used to challenge the dominance of Nvidia and NVLink. This is a real working group developing a genuine standard with actual solutions being designed.
In parallel, we see some of the groundwork being laid by UALink Promotor companies like Astera Labs, which recently introduced its Scorpio P-Series and X-Series fabric switches. While the P-Series switch enables GPU-to-CPU connectivity over PCIe Gen 6 (which can be customized), the X-Series is a switch aimed at GPU-to-GPU connectivity. Given that the company has already built the underlying fabric, one can see how it could support UALink sometime soon after the specification is published.
It is important to understand that UALink is agnostic about accelerators and the fabrics, switches, retimers and other technology that enable accelerator-to-accelerator connectivity. It doesn’t favor AMD over Nvidia, nor does it favor Astera Labs over, say, Broadcom (if that company chooses to contribute). It’s about building an open set of standards that favors innovation across the ecosystem.
While the average enterprise IT administrator, or even CIO, won’t care much about UALink, they will care about what it will deliver to their organization: faster training and inference on platforms that consume less power and can be somewhat self-managed and tuned. Putting a finer point on it—faster results at lower cost.
It’s easy to regard what UALink is doing as an attempt to respond to Nvidia’s stronghold. And at some level, it certainly is. However, in the bigger picture this is less about copying what Nvidia does and more about ensuring that critical capabilities like GPU-to-GPU connectivity don’t fall under the purview of one company with a vested interest in optimizing for its own GPUs.
It will be interesting to watch how server vendors such as Dell, HPE, Lenovo and others choose to support both UALink and NVLink. (Lenovo is a “Contributor” member of the UALink Consortium, but Dell has not joined as yet.) NVLink uses a proprietary signaling interconnect to support Nvidia GPUs. Alternatively, UALink will support accelerators from a range of vendors, with switching and fabric from any vendor that adheres to the UALink standard.
There is a real and significant cost to these server vendors—from design to manufacturing and through the qualification and sales/support process. On the surface, it’s easy to see where UALink would appeal to, say, Dell or HPE. However, there is a market demand for Nvidia that cannot and will not be ignored. Regardless of one’s perspective on the ability of “the market” to erode Nvidia’s dominance, we can all agree that its dominance will not fade fast.
The UALink Consortium (and forthcoming specification) is a significant milestone for the industry as the challenges surrounding training AI models and operationalizing data become increasingly complex, time-consuming and costly.
If and when we see companies like Astera Labs and others develop the underlying fabric and switching silicon to drive accelerator-to-accelerator connectivity, and when companies like Dell and HPE build platforms that light all of this up, the downmarket impact will be significant. This means the benefits realized by hyperscalers like AWS and Meta will also benefit enterprise IT organizations that look to operationalize AI across business functions.
Ideally, we would have a market with one standard interconnect specification for all accelerators—all GPUs. And maybe at some point that day will come. But for now, it’s good to see rivals like AMD and Intel or Google and AWS coalesce around a standard that is beneficial to all.
The post Digging Into The Ultra Accelerator Link Consortium appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending November 22, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending November 22, 2024 appeared first on Moor Insights & Strategy.
]]>The post RESEARCH NOTE: Pure Storage Comes Downmarket with FlashArray//C20 appeared first on Moor Insights & Strategy.
]]>When Pure Storage was founded in 2009, it made its mark by focusing on flash as the only storage medium for the enterprise. It did so at a time when flash storage was still limited in adoption, primarily due to cost. Fast-forward 15 years, and the company’s strategy has proven wise. Flash is dominant in enterprise primary storage, and adoption continues to grow. Further, even as the storage market has been fairly flat over the last couple of years, Pure has continued to see double-digit growth quarter after quarter.
Although the company has firmly established itself in the enterprise, it has not gained the same momentum in the small and mid-market segments. In response to this lack of market penetration, the company has just launched its FlashArray//C20 platform. This research note will look at Pure’s push into the mid-market and what the company needs to do to penetrate this segment.
Every IT organization—regardless of company size—wants to extract as much value as possible from the solutions it deploys, especially on the storage front. While “extracting value” can mean different things to different organizations, cost, capacity, and performance are three consistent elements of the value equation.
When it first hit the market, flash storage (NAND flash) was exclusive to performance-sensitive workloads due to its high cost per gigabyte. However, as this cost started to curve down over time, flash storage became more affordable for broad use across the enterprise. Yes, different types of flash—QLC versus SLC versus TLC—have different price points, so some are more expensive than others. And yes, the price of flash is somewhat volatile, given the glut/scarcity cycles that impact this market. Still, if one were to plot an average cost per gigabyte over time, there would be a significant downward trend.
As flash has come down in price per gigabyte, its capacity has increased. For example, Pure’s largest-capacity flash storage—its DirectFlash Module—is 150TB and will ship by the end of this year. Further, the company intends to ship a 300TB module by 2026.
Based on the above, you can see how even at a very low price point—say 7 cents per gigabyte—flash-based storage solutions can still be price-prohibitive for a mid-sized company. This is unfortunate because the ease of deploying and managing Pure’s storage solution is ideal for a typical mid-sized company that probably doesn’t have the depth of technical expertise that many enterprise IT organizations have.
In an attempt to bridge this gap between the needs of the mid-market and the economics of storage, this week Pure announced its FlashArray//C20. This storage solution comes with lower capacity to enable a lower overall price point for mid-market IT organizations. However, this is the same Pure Storage architecture that has benefitted the enterprise, with features such as:
With this launch, Pure is bringing the entire enterprise storage experience to the mid-market. For example, mid-market customers that deploy the C//20 can still benefit from Pure’s Evergreen architecture. This guarantees that the customer’s storage infrastructure is always the most modern through non-disruptive upgrades. This effectively brings a white-glove upgrade experience to the mid-market.
The //C20 uses the same Pure designed flash modules as its enterprise offerings. While it would likely be cheaper to drop in commodity flash to drive down system costs, Pure is willing to sacrifice a little bit of margin to deliver enterprise quality and performance.
The last point on this enterprise experience theme is integration into the Pure storage platform. The //C20 uses the same management plane used across the portfolio. Want to use Pure1 management features or Pure Fusion services? The entirety of Pure’s control plane for managing data and consolidation is available for all customers.
If I were still an IT leader, I might find this simple, somewhat automated approach to managing my storage environment the most significant benefit for a mid-market organization. The modern mid-sized business may not have the breadth of an enterprise, yet it still struggles with many of the same challenges as the enterprise. The workloads being deployed are complex, and the hybrid environments where they are deployed can be a challenge, as is the relentless focus on data—data generation, data collection, data management, data utilization. Pure has enabled feature parity of its storage solution to account for this reality—simply at a lower capacity.
There are three consistent elements of value in storage: performance, price, and capacity. Pure is delivering on all of them.
Mind you, there’s also a play for the //C20 in the enterprise. For remote office/branch office (ROBO) or edge deployments, this lower-capacity and more affordable storage box can be ideal for powering something like a retail location or a bank branch that requires local storage but needs to be managed centrally. This is another example of how the architectural consistency and single control plane of Pure enable flexibility that empowers IT architects and administrators.
Building a great product for a target market is only half of a winning equation. The other half is go-to-market. In other words, how do you find your target audience, tell the right story, and create a frictionless buying experience?
The selling model for the mid-market segment is indirect. These companies tend to buy through channels and have little loyalty to specific technology vendors. CDW, Connections, SHI, and the like are all common resellers that serve this market.
Pure is a channel-friendly company, and its positioning, messaging, and overall GTM machine are well-suited for this market segment. However, given the transactional nature of the mid-market, the company will have to double down on its channel engagement and enablement efforts, ensuring that those reseller account reps are quick to suggest the //C20 whenever a customer calls needing storage.
Overall, I believe the company has the assets and ability to effectively come downmarket with its messaging for the //C20 and for Pure Storage itself in short order.
The FlashArray//C20 is a fairly significant expansion of Pure Storage’s reach. This enterprise storage company is bringing the power and efficiency of its all-flash technology to a new market segment. In doing so, it is also competing with a company (NetApp) that has been established in this segment for a while.
I am a big fan of the parity in features and capabilities between the //C20 and its enterprise siblings up the stack. It makes for an easy marketing campaign, but more importantly it delivers much-needed capabilities to a segment that is sometimes an afterthought for IT solutions vendors.
Stay tuned for updates on the company’s mid-market penetration in upcoming quarters.
The post RESEARCH NOTE: Pure Storage Comes Downmarket with FlashArray//C20 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending November 8, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending November 8, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending November 1, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending November 1, 2024 appeared first on Moor Insights & Strategy.
]]>The post Datacenter Podcast: Episode 32 – Talking Juniper, AMD & Intel, IBM, Cisco, Oracle, Google appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
2:36 Juniper’s Mining For GenAI Gold
10:49 What To Make Of Semiconductor Earnings?
18:42 Granite 3.0 Rocks
26:43 Open The Cisco AI POD Bay Doors Hal
34:59 OCI’s Unique Path To Growth
45:01 Dry Watermarks For AI
50:14 Our Top 3 List
Juniper’s Mining For GenAI Gold
https://x.com/WillTownTech/status/1852296047806066826
What To Make Of Semiconductor Earnings?
https://www.linkedin.com/feed/update/urn:li:activity:7257130388483919872/
https://www.intc.com/news-events/press-releases/detail/1716/intel-reports-third-quarter-2024-financial-results
Granite 3.0 Rocks
https://www.forbes.com/sites/moorinsights/2024/10/25/ibms-new-granite-30-ai-models-show-strong-performance-on-benchmarks/
Open The Cisco AI POD Bay Doors Hal
https://x.com/WillTownTech/status/1851493750461075694
OCI’s Unique Path To Growth
https://www.linkedin.com/feed/update/urn:li:activity:7255962768703434752/
Dry Watermarks For AI
https://www.nature.com/articles/d41586-024-03462-7
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Datacenter Podcast: Episode 32 – Talking Juniper, AMD & Intel, IBM, Cisco, Oracle, Google appeared first on Moor Insights & Strategy.
]]>The post Analyzing AMD’s Next-Generation CPU, GPU And DPU appeared first on Moor Insights & Strategy.
]]>AMD held its Advancing AI 2024 event last week, where it launched its latest datacenter silicon—the 5th Generation EPYC processor (codenamed “Turin”) and the MI325X AI accelerator. On the networking front, the company introduced Pensando Salina and Pensando Pollara to address front-end and back-end networking, respectively. As the silicon market gets hotter and hotter, AMD’s launches have become increasingly anticipated. Let’s dig into what AMD launched and what it means for the industry.
For those who thought the AI hype cycle was at its peak, guess again. This trend is stronger than ever, and with good reason. As the AI market starts to move from frontier models and LLMs to operationalizing AI in the enterprise, virtually every IT organization is focused on how to best support these workloads. That is, how does IT take a model or models, integrate and tune them using organizational data and use the output in enterprise applications?
Further, organizations that have already operationalized AI to some degree are now exploring the concept of agentic AI, where AI agents learn from each other and become smarter. This trend is still a bit nascent, but we can expect it to grow rapidly.
The point is that AI in the enterprise is already here for many companies and right around the corner for many more. With this comes the need for compute platforms tailored for AI’s unique performance requirements. In addition to handling traditional workloads, CPUs are required to handle the AI data pipeline, and GPUs are required to perform the tasks of training and inference. (CPUs can also be used to perform the inference task.)
Because of this, AI silicon market leader Nvidia has designed its own CPU (Grace) to tightly integrate and feed its GPUs. While the company’s GPUs, such as Hopper and Blackwell, will run with any CPU, their tight integration with Grace is designed to deliver the best performance. Similarly, Intel has begun to enter the AI space more aggressively as it builds tight integration among its Xeon CPUs, Gaudi AI accelerators and forthcoming GPU designs.
For AMD, the integration of CPU with GPU (and GPUs connected by DPUs) is the company’s answer to the challenges faced by enterprise IT and hyperscalers alike. This integration accelerates the creation, cleansing, training and deployment of AI across the enterprise.
To meet the entire range of datacenter needs, AMD designed two EPYC Zen 5 cores—the Zen 5 and Zen 5c. The Zen 5, built on a 4nm process, is the workhorse CPU designed for workloads such as database, data analytics and AI. The Zen 5c is designed with efficiency in mind. This 3nm design targets scale-out cloud and virtualized workloads.
AMD has held a performance leadership position in the datacenter throughout the last few generations of EPYC. There are more than 950 cloud instances based on this CPU, and the reason is quite simple. Thanks to AMD’s huge advantages in terms of number of cores and performance of those cores, cloud providers can put more and more of their customers’ virtual machines on each server. Ultimately, this means the CSP can monetize those servers and processors in a much more significant way.
In the enterprise, even though servers are a budget line item instead of a contributor to revenue (and margin), the math still holds: those high-core-count servers can accommodate more virtual machines, which means less IT budget goes to infrastructure so that more can go to other initiatives like AI.
Having lots of cores doesn’t mean anything if they don’t perform well. In this regard, AMD has also delivered with Turin. Instructions per cycle is a measure of how many instructions a chip can process every clock cycle. This tells us how performant and efficient a CPU is. The fact that Turin has been able to deliver double-digit percentage increases in IPC—large ones—over its predecessor is significant.
Because the new EPYC launched a couple of weeks after Intel’s Xeon 6P CPU (see my deep analysis on Forbes), we haven’t yet seen head-to-head comparisons in terms of performance. However, we can do a couple of things to get a feel for how EPYC and Xeon compare. The first is to look at the side-by-side “billboard” specifications. When comparing these chips for scale-out workloads, the 5c CCD-based CPUs have up to 192 cores with 12 DDR5 memory channels (6,400 MT/s) and 128 lanes of PCIe Gen 5.
By comparison, Intel’s Xeon 6E (efficiency core) scales up to 144 cores with 12 DDR5 memory channels and 96 lanes of PCIe Gen 5. However, in the first quarter of 2025, Intel will launch its second wave of Xeon 6E, which will scale up to 288 cores.
It’s clear that on the performance side of the equation, EPYC and Xeon are close on specs—128 cores, 12 channels of memory and lots of I/O (128 lanes of PCIe for EPYC, 96 for Xeon). Here are some of the differences between the two:
Below are the many benchmarks that AMD provided to demonstrate Turin’s performance. I show this because the SPEC suite of benchmarks most closely and objectively measures a CPU’s core performance. In this test, the 5th Gen EPYC significantly outperforms the 5th Gen Xeon.
As I always say with any benchmark a vendor provides, take these results with a grain of salt. In the case of this benchmark, the numbers themselves are accurate. However, Xeon’s performance took a significant leap between 5th Gen and Xeon 6P, making it hard to truly know what the performance comparison looks like until both chips can be independently benchmarked. Mind you, AMD couldn’t test against Xeon 6P, so I do not fault the company for this. However, I’d like to see both companies perform this testing in the very near future.
The market is responding positively to EPYC, and no doubt about it. In fact, in the five generations that EPYC has been on the market, AMD’s datacenter CPU share has climbed from less than 2% to about 34%. Given the slow (yet accelerating) growth of EPYC in the enterprise, this tells me that the CPU’s market share just for the cloud and hyperscale space must be well north of 50%. In fact, Meta recently disclosed that it has surpassed 1.5 million EPYC CPUs deployed globally—and that’s before we get to the CSPs.
I expect that Turin will find greater adoption in the enterprise datacenter, further increasing EPYC’s market share. In the last couple of quarters, I’ve noticed AMD CEO Lisa Su saying that enterprise adoption is beginning to accelerate for EPYC. Additionally, the rising popularity of the company’s Instinct MI300X series GPUs should help EPYC deepen its appeal. Which brings us to our next topic.
While we look to the CPU to perform much of the work in the AI data pipeline, the GPU is where the training and inference magic happens. The GPU’s architecture—lots of little cores that enable parallelism, combined with high-bandwidth memory and the ability to perform matrix multiplications at high speeds—delivers efficiency. Combined with optimized libraries and software stacks, these capabilities make for an entire AI and HPC stack that developers and data scientists can employ more easily.
While Nvidia has long been the leader in the HPC and AI space, AMD has quietly made inroads with its Instinct MI300 Series GPUs. Launched at the inaugural Advancing AI event in 2023, the MI300X posed the first legitimate alternative to the Nvidia H100 and H200 GPUs for AI training through a combination of its hardware architecture and ROCm 6.0 software stack (competing with Nvidia’s CUDA).
Over the following few quarters, AMD went on to secure large cloud-scale wins with the likes of Meta, Microsoft Azure, Oracle Cloud Infrastructure and the largest independent cloud provider, Vultr, to name a few. This is important because these cloud providers modified their software stacks to begin the effort of supporting Instinct GPUs out of the box. No more optimizing for CUDA and “kind of” supporting ROCm—this is full-on native support for the AMD option. The result is training and inference on the MI300 and MI325 that rival Nvidia’s H100 and H200.
Introducing the Instinct MI325X is the next step for closing the gap on Nvidia. This GPU, built on AMD’s CDNA 3 architecture and boasting 256GB of HBM3E memory, claims to deliver orders of magnitude better performance over the previous generation as well as leadership over Nvidia.
As mentioned, hardware is only part of the equation in the AI game. A software stack that can natively support the most broadly deployed frameworks is critical to training data and operationalizing AI through inference. On this front, AMD has just introduced ROCm 6.2. With this release, the company is making bold claims about performance gains, including a doubling of performance and support for over a million models.
Bringing it all together is networking, which requires both connecting AMD’s AI cluster to the network and connecting all of this AI infrastructure on the back end. First, the company introduced its third-generation DPU—the Pensando Salina. Salina marries high-performance network interconnect capabilities and acceleration engines aimed at providing critical offload to improve AI and ML functions. Among the new enhancements are 2x400G transceiver support, 232 P4 match processing units, 2x DDR5 memory and 16 Arm Neoverse N1 cores.
Combined, these features should facilitate improved data transmission, enable programming for more I/O functions and provide compute density and scale-out—all within a lower power-consumption envelope—for hyperscale workloads. AMD claims that Salina will provide a twofold improvement in overall performance compared to its prior DPU generations; if it delivers on this promise, it could further the company’s design wins with public cloud service providers eager to capitalize on the AI gold rush.
Second, the AMD Pensando Pollara 400 represents a leap forward in the design of NICs. It is purpose-built for AI workloads, with an architecture based on the latest version of RDMA that can directly connect to host memory without CPU intervention. AMD claims that this new NIC, which employs unique P4 programmability and supports 400G interconnect bandwidth, can provide up to 6x improvement in performance when compared to legacy solutions using RDMA over Converged Ethernet version 2. Furthermore, the Pollara 400 is one of the industry’s first Ultra Ethernet-ready AI NICs, supported by an open and diverse ecosystem of partners within the Ultra Ethernet Consortium, including AMD, Arista, Cisco, Dell, HPE, Juniper and many others.
AMD’s new NIC design could position it favorably relative to Broadcom 400G Thor, especially since the company is the first out of the gate with a UEC design. Both the Salina DPU and Pollara 400 NIC are currently sampling with cloud service and infrastructure providers, with commercial shipments expected in the first half of 2025.
One of the understated elements of AMD’s AI strategy is seen in an image above: the acquisition of Silo AI. This Finnish company, the largest private AI lab in Europe, is filled with AI experts who spend all their time helping organizations build and deploy AI.
When looking at what AMD has done over the last year or so, it has built an AI franchise by bringing all of the critical elements together. At the chip level, the company delivered 5th Gen EPYC for compute, MI325X for GPU and Salina and Pollara for front-end and back-end networking. ROCm 6.2 creates the software framework and stack that enables the ISV ecosystem. The acquisition of ZT Systems last month delivers rack-scale integration that Silo AI can use to deliver the last (very long) mile to the customer.
In short, AMD has created an AI factory.
As I say again and again in my analyses of this market, AI is complex—and even that is an understatement. Different types of compute engines are required to effectively generate, collect, cleanse, train and use AI across hyperscalers, the cloud and the enterprise. This translates into a need for CPU, GPU and DPU architectures that are not only complementary, but indeed optimized to work with one another.
Over time, AMD has acquired the pieces that enable it to deliver this end-to-end AI experience to the market. At Advancing AI 2024, the company delivered what could be called its own AI factory. It is important to note that this goes beyond simply providing an alternative to Nvidia. AMD is now a legitimate competitor to Nvidia.
At the same time, AMD demonstrated a use for all of this technology outside of the AI realm, too. With the new EPYC, it has delivered a generation of processors that demonstrates continued value in the enterprise. And in the MI325X, we also see excellent performance across the HPC market.
Here is my final takeaway from the AMD event: The silicon market is more competitive than ever. EPYC and Xeon are both compelling for the enterprise and the cloud. On the AI/HPC front, the MI325X and H100/H200/B200 GPUs are compelling platforms. However, if I were to create a Venn diagram, AMD would be the only company strongly represented in both of these markets.
Game on.
The post Analyzing AMD’s Next-Generation CPU, GPU And DPU appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 25, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 25, 2024 appeared first on Moor Insights & Strategy.
]]>The post VAST Data Deepens Its AI Enablement With InsightEngine appeared first on Moor Insights & Strategy.
]]>VAST Data launched its Data Platform about a year ago, aiming to unify storage, compute and data. The company’s bigger goal is to remove the complexity of connecting all of an enterprise’s data to the applications and tools that turn that data into intelligence.
In its latest move, the company and AI giant Nvidia have partnered to announce InsightEngine, which is designed to deliver real-time retrieval-augmented generation. Let’s take a deeper look at this announcement and consider what this means for enterprise IT organizations and the industry as a whole.
First, it’s worth revisiting the underlying problem that VAST addresses. Saying that AI is complex is not original or controversial. It’s complex for many reasons, including technical, operational and organizational aspects. One of the biggest challenges comes from the data used for AI. Data resides everywhere, from the edge to on-premises datacenters to the cloud. Data also resides in the applications that power the business—ERP, CRM, HRM and the like. Finally, data exists in many different formats, both structured (e.g., database tables) and unstructured (documents, pictures, etc.).
Here’s the long-running challenge: how does an enterprise that wants to extract value from its data do that easily? Historically, the answer has been: it doesn’t. That’s what VAST has tried to address with its Data Platform, which has resolved many of the challenges in this area through a number of components:
So to recap, the introduction of the VAST Data Platform was aimed directly at the challenge of how IT organizations can more easily collect, prepare and train large amounts of data that feed large language models for use in AI applications.
But the challenge continues to evolve. As AI ages a little, we have started to see the discussion shift from frontier models to enterprise inference. As the discussion shifts, so does the challenge of how we make this trained data work in the enterprise beyond simple chatbots and the like. How does inference work to drive business outcomes? And is RAG the answer? For the latter question, VAST would argue: not in its current state.
InsightEngine is where VAST has trained its focus to help enterprises extract full value from AI inference. Working with Nvidia, InsightEngine delivers more accurate, more contextualized responses to the queries that a user or another application may initiate. NIM (which stands for “Nvidia inference microservices”) is Nvidia’s framework that enables an enterprise to take trained data and use it more precisely and efficiently in each application.
By working with NIM, InsightEngine can create vector and graph embeddings in VAST’s DataBase product. Whenever new data is generated, vector embeddings are generated to update the database in real time. These vectors, graphs and tables are then used in RAG. The result is an implementation of RAG that is highly accurate and delivered in real time from VAST’s vector database, which can scale up to trillions of embeddings.
Depending on how inference is used, real-time RAG’s benefit may not be as critical to a specific organization. However, for mission- and business-critical applications that are driven by AI agents—and interact with other AI agents—a lack of real-time data can be a serious issue. If you think this agentic model (i.e., one in which AI agents interact with one another across the enterprise) is a little futuristic, it’s not. Or maybe more precisely put, it is futuristic—but the future is now.
How is all of this possible? VAST employs a disaggregated, share-nothing architecture. This takes a standard storage architecture and makes it broad and shallow. This removes the notion of data tiering, so essentially all data is “hot.” Because of this, InsightEngine can quickly ingest data from enterprise applications and vectorize it in the VAST DataBase. Object, file, table, graph—all of it gets stored in this transactional/analytical database for retrieval. And whenever real-time RAG is enabled, InsightEngine also fine-tunes your large language models.
The less-covered element of VAST’s announcement is arguably the most valuable to enterprise IT today. Cosmos is a community where VAST directly connects AI practitioners with AI experts. While every organization would love to hire 20 Ph.D.s to design and deploy AI across the enterprise, the reality is that AI talent is scarce—and pricey. While the VAST Data Platform and InsightEngine are intended to simplify the process of deploying and operationalizing AI, the term “simplify” is relative. For many IT organizations, it’s still going to be really hard—and the skills gap is real.
With Cosmos, IT professionals can join a forum and interact with each other and with experts to better understand best practices and work through challenges that may otherwise seem impossible to tackle. This isn’t simply connecting a user to a VAST support person; it connects them to other users facing the same challenges, along with folks from the big consulting firms and the hardware and software vendors.
Of course, communities like Cosmos are constrained by how much they are used and how well they are moderated. If this community becomes nothing but a sales vehicle for Accenture, Deloitte and others, it will quickly lose its appeal. However, there is real potential here.
When VAST announced the Data Platform last year, it was the only vendor bringing such data management to storage. With InsightEngine, it has further differentiated itself. However, NetApp recently announced its storage and data management platform ONTAP with an AI engine that performs the functions of much of the AI data pipeline.
Perhaps VAST’s biggest competitor in the high-performance storage space is Weka, which has its own data platform for generative AI. Weka’s cloud-native architecture might be the closest to VAST’s, in that the company has designed its solution from the ground up for high performance.
The addition of InsightEngine with Nvidia to VAST’s architecture delivers an advantage for VAST because it expands coverage along not just the AI data pipeline but the whole AI journey, from training to inference. VAST’s customers are a Who’s Who of data- and performance-driven organizations, such as Zoom, NASA, Pixar and GPU cloud provider CoreWeave.
VAST is a data management company. Though its early years were spent designing high-performance storage, that was clearly done to build a foundation for its data management play. Further, the company has successfully built out its storage and data management platforms—otherwise, it would not have a valuation of over $9 billion.
Here are two things to consider about VAST. The first is that it caters to the needs of companies with significant data management challenges—the cream of the crop, if you will. VAST will undoubtedly continue to find success in this space, but there are questions about whether its technology can successfully come downmarket to find a larger addressable market. For that matter, does VAST even want to?
The second consideration is where I imagine myself as a VAST customer. Deploying the VAST Data Platform is a deep engagement. Once I jump in, it’s not easy to move away from it. This isn’t a bad thing, but it is undoubtedly a consideration for any enterprise IT organization considering vendors to support its AI journey.
VAST’s evolution has been fun to watch. From a storage company that took the HPC world by storm to claiming the AI OS title, it has been a bold company that hasn’t been afraid to be the first mover.
When the company introduced the Data Platform last year, conceptualizing it was a little hard. This was partly because the company was out in front of the market, talking about AI pipelines, global namespaces, DASE and DataEngine while everybody else was talking about LLMs and ChatGPT. InsightEngine brings the VAST Data Platform into sharper focus and shows how the company is making itself an integral part of the entire AI journey—from finding and preparing data to training and inference.
The one bit of advice I would leave you with is this: AI is still complex. While VAST has removed a lot of the complexity, the AI market has seen far more failures than successes to date. Look to Cosmos and other communities to engage with experts and ensure you lay down the right foundation.
The post VAST Data Deepens Its AI Enablement With InsightEngine appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 18, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 18, 2024 appeared first on Moor Insights & Strategy.
]]>The post Xeon 6P And Gaudi 3 — What Did Intel Deliver? appeared first on Moor Insights & Strategy.
]]>Intel just continued the execution of its aggressive “five nodes in four years” strategy with the launch of the Xeon 6P (for performance) CPU and the Gaudi 3 AI accelerator. This launch comes at a time when Intel’s chief competitor, AMD, has been steadily claiming datacenter market share with its EPYC CPU.
It’s not hyperbolic to say that a successful launch of Xeon 6P is important to Intel’s fortunes, because the Xeon line has lagged in terms of performance and performance per watt for the past several generations. While Xeon 6E set the tone this summer by responding to the core density its competition has been touting, Xeon 6P needed to hit AMD on the performance front.
Has Xeon 6P helped Intel close the gap with EPYC? Is Gaudi 3 going to put Intel into the AI discussion? This article will dig into these questions and more.
Yes and no—and it’s worth taking a moment to explain what’s going on. Xeon 6 represents the first time in recent history that Intel has delivered two different CPUs to address the range of workloads in the datacenter. In June 2024, Intel launched its Xeon 6E (i.e., Intel Xeon 6700E), which uses the Xeon 6 efficiency core. This CPU, codenamed “Sierra Forest,” ships with up to 144 “little cores,” as Intel calls them, and focuses on cloud-native and scale-out workloads. Although Intel targets these chips for cloud build-outs, I believe that servers with 6E make for great virtualized infrastructure platforms in the enterprise.
In this latest launch, Intel delivered its Xeon 6P CPU (Intel Xeon 6900P), which uses the Xeon 6 performance core. These CPUs, codenamed “Granite Rapids,” are at the high end of the performance curve with high core counts, a lot of cache and the full range of accelerators. Specifically, Xeon 6P utilizes Advanced Matrix Extensions to boost AI significantly. This CPU is Intel’s enterprise data workhorse supporting database, data analytics, EDA and HPC workloads.
The company will release the complementary Xeon 6900E and 6700P series CPUs in Q1 2025. The 6900E will expand on the 6700E by targeting extreme scale-out workloads with up to 288 cores. Meanwhile, the 6700P will offer a lower level of performance Xeon with fewer cores and less rich cache. It is still great for enterprise workloads, just not with the extreme specs of the 6900P.
Effectively, Intel has launched the lowest end of the Xeon 6 family (6700E) and the highest end (6900P). In Q1 2025, it will fill in the middle with the 6700P and 6900E.
The other part of Intel’s datacenter launch was Gaudi 3, the company’s AI accelerator. Like Xeon 6, Gaudi 3 has been talked about for some time. CEO Pat Gelsinger announced it at the company’s Vision conference in April, where we provided a bit of coverage that’s still worth reading for more context. At Computex in June, Gelsinger offered more details, including pricing. The prices he cited suggest a significant advantage for Intel compared to what we suspect Nvidia and AMD are charging for their comparable products (neither company has published its prices). However, for as much as Gaudi 3 has already been discussed, it has only now officially launched.
Xeon 6P is a chiplet design built on two processes. The compute die, consisting of cores, cache, mesh fabric (how the cores connect) and memory controllers, is built on the Intel 3 process. As the name implies, this is a 3nm process. The chip’s I/O dies are built on the old Intel 7 process—at 7nm. This process contains PCIe, CXL and Xeon’s accelerator engines (more on those later).
The result of the process shrink in Xeon 6 is a significant performance-per-watt advantage over its predecessor. When looking at the normal range of average utilization rates, Xeon 6P demonstrates a 1.9x increase in performance per watt relative to the 5th Gen Xeon.
Intel’s testing compared its top-of-bin (highest-performing) CPU—the 6890P with 128 cores and a 500-watt TDP—against the Xeon Platinum 8592+ CPU, a top-of-bin 5th Gen Xeon with a TDP of 350 watts. Long story short, Intel has delivered twice the cores with a roughly 7% increase in per-core performance and a considerably lower power draw per core.
It’s what’s inside the Xeon 6P that delivers a significant performance boost and brings it back into the performance discussion with its competition. Packed alongside those 128 performant cores is a rich memory configuration, lots of I/O and a big L3 cache. Combine these specs with the acceleration engines that Intel started shipping two generations ago, and you have a chip that is in a very competitive position against AMD.
When looking at the above graphic, it may seem strange to see two memory speeds (6400 MT/s and 8800 MT/s). Xeon 6P supports MRDIMM technology, or multiplexed ranked DIMMs. With this technology, memory modules can operate two ranks simultaneously, effectively doubling how much data the memory can transfer to the CPU per clock cycle (128 bytes versus 64 bytes). As you can see from the image above, the bandwidth increases dramatically when using MRDIMM technology, meaning that more data per second can be fed to those 128 cores. Xeon 6P is the first CPU to ship with this technology.
I point out this memory capability to give an example of the architectural design points that have led to some of Intel’s performance claims for Xeon 6P. Despite what some may say, performance is not just about core counts. Nor is it simply about how much memory or I/O a designer can stuff into a package. It’s about how quickly a chip can take data (and how much data), process it and move on to the next clock cycle.
When I covered Intel’s launch of its 4th Gen Xeon (codenamed “Sapphire Rapids”), I talked about how I thought the company had found its bearings. This was not because of Xeon’s performance. Frankly, from a CPU perspective, it fell short. However, the company designed and dropped in a number of acceleration engines to deliver better real-world performance across the workloads that power the datacenter.
The design of Xeon 6P, building on what Intel introduced with Sapphire Rapids, sets it up to handle AI, analytics and other workloads well beyond what the (up to) 128 Redwood Cove cores can handle. And frankly, the Xeon 6P delivers. The company makes strong claims in its benchmarking along the computing spectrum—from general-purpose to HPC to AI. In each category, Intel claims significant performance advantages compared to AMD’s 4th Gen EPYC processors. In particular, Intel focused its benchmarks on AI and how Xeon stacks up.
As I say with every benchmark I ever cite, these should be taken with a grain of salt. These are Intel-run benchmarks on systems configured by its own testing teams. When AMD launches its “Turin” CPU in a few weeks, we’ll likely see results that contradict what is shown above and favor AMD. However, it is clear that Intel is back in the performance game with Xeon 6P. Further, I like that the company compared its performance against a top-performing AMD EPYC of the latest available generation, instead of cherry-picking a weaker AMD processor to puff up its own numbers.
One last note on performance and how Xeon 6P stacks up. In a somewhat unusual move, Intel attempted to show its performance relative to what AMD will launch soon. Based on AMD’s presentations at the Hot Chips and Computex conferences, AMD has made some bold performance claims relative to Intel. In turn, Intel used this data to show Xeon 6P’s projected performance relative to Turin when the stack is tuned for Intel CPUs.
Again, I urge you to take these numbers and claims with a grain of salt. However, Intel’s approach with these comparisons speaks to its confidence in Xeon 6P’s performance relative to the competition.
As mentioned above, we covered the specifications and performance of Gaudi 3 in great detail in an earlier research note. So, I will forego recapping those specs and get straight to the heart of the matter: Can Gaudi compete with Nvidia and AMD? The answer is: It depends.
From an AI training perspective, I believe Nvidia and to a lesser extent AMD currently have a lock on the market. Their GPUs have specifications that simply can’t be matched by the Gaudi 3 ASIC.
From an AI inference perspective, Intel does have a play with Gaudi 3, showing significant price/performance advantages (up to 2x) versus Nvidia’s H100 GPU on a Llama 2 70B model. On the Llama 3 8B model, the advantage fell to a 1.8x performance per dollar advantage.
This means that, for enterprise IT organizations moving beyond training and into inference, Gaudi 3 has a role, especially given the budget constraints many of those IT organizations are facing.
More importantly, Gaudi 3 will give way to “Falcon Shores” over the next year or so, the first Intel GPU for the AI (and HPC) market. All of Intel’s important work in software will move along with it. Why does that matter? Because organizations that have spent time optimizing for Intel won’t have to start from scratch when Falcon Shores launches.
While I don’t expect Falcon Shores to bring serious competition to Nvidia or AMD, I do expect it will lead to a next-generation GPU that will properly put Intel in the AI training game. (It’s worth remembering that this is a game in its very early innings.)
Intel needed to make a significant statement in this latest datacenter launch. With Xeon 6P, it did just that. From process node to raw specs to real-world performance, the company was able to demonstrate that it is still a leader in the datacenter market.
While I expect AMD to make a compelling case for itself in a few weeks with its launch of Turin, it is good to see these old rivals on more equal footing. They make each other better, which in turn delivers greater value to the market.
The post Xeon 6P And Gaudi 3 — What Did Intel Deliver? appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 11, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 11, 2024 appeared first on Moor Insights & Strategy.
]]>The post Datacenter Podcast: Episode 31 – Talking AMD UEC NIC & EPYC & MI300, OpenAI, Qualcomm, IBM appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
2:22 AMD Is Bringing Sexy Back To Networking
10:49 OpenAI o1 Is PhD Smart
20:38 Mo’ Cores Mo’ Cache – AMD PI
28:26 Qualcomm Gets Edgy With Campus & Branch Connectivity Infrastructure
32:48 Europe Gets A Quantum Data Center
39:55 Mind The (AI) Gap – AMD PII
AMD Is Bringing Sexy Back To Networking
https://x.com/WillTownTech/status/1844465726226301209
OpenAI o1 Is PhD Smart
https://openai.com/index/learning-to-reason-with-llms/
Mo’ Cores Mo’ Cache – AMD PI
https://www.amd.com/en/products/processors/server/epyc/9005-series.html
Qualcomm Gets Edgy With Campus & Branch Connectivity Infrastructure
https://x.com/WillTownTech/status/1843375274781749430
Europe Gets A Quantum Data Center
https://www.ibm.com/quantum/blog/europe-quantum-datacenter-software
Mind The (AI) Gap – AMD PII
https://www.amd.com/en/products/accelerators/instinct/mi300.html
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show, the contents of this show should not be taken as investment advice.
The post Datacenter Podcast: Episode 31 – Talking AMD UEC NIC & EPYC & MI300, OpenAI, Qualcomm, IBM appeared first on Moor Insights & Strategy.
]]>The post Pure Storage Keeps Removing Complications From Enterprise Data Storage appeared first on Moor Insights & Strategy.
]]>This week’s Pure Accelerate London event kicked off with a bang. Pure Storage released a number of updates across its portfolio aimed at driving improvements to performance, cost and simplicity. And, of course, AI had to be part of this update release lest the tech gods be upset.
There was quite a bit in this release cycle to unpack and explore—and the following few sections will do precisely that.
Since its founding in 2009, Pure Storage has been focused on modernizing the enterprise storage environment. It was the first storage company to only support flash storage and it pioneered storage-as-a-service and the cloud operating model. The company has also been at the forefront of the shifting economics of storage consumption with its Evergreen program.
In a nutshell, Pure is the embodiment of the modern storage company. For folks in the IT business for a while, some of the changes Pure has driven can seem to be borderline heresy: No spinning media? No tape? By Zeus, what will we do?
Yet Pure’s approach is how IT consumes storage now—cloud connectivity, cloud operating models and cloud economics (the way cloud economics is supposed to operate). The days of dedicated IT teams performing very specific functions in the datacenter are firmly in the rearview mirror. When an embedded development team in a business unit requires a development environment to be created, they want it now—not in four weeks after six different specialists meet to spin up the environment. Otherwise, they will simply go to the public cloud.
Adding to this tension is a modern IT workforce that consumes and interacts with technology differently than the generation that precedes them. These are smart IT folks that grew up on apps and the cloud.
Pure seems almost singularly focused on abstracting all the complexity away from storage management. This is critical, as storage is a foundational building block for our IT environments. And Pure attacks this challenge from every angle.
Given this context, it’s no surprise that Pure’s latest updates cover hardware, software and services.
Here’s the setup. The legacy way of file storage is really legacy—like 20-plus years old. Teams would design and build a storage architecture and grow it over time. In this scenario, what inevitably happens is that silos grow and are managed independently of one another. One day, IT realizes just how inflexible this is.
In today’s world, storage has to be more flexible. An enterprise’s AI and analytics apps want access to all of the available data that exists across the enterprise, regardless of where it resides and regardless of where the apps using it are running. What’s needed is a single architecture that accesses data around the enterprise with a single control plane. This, in a nutshell, is what Pure’s Real-time Enterprise File does.
With Real-time Enterprise File, all storage is seen as a global pool (think clustering with no limitations). This is all managed as a single architecture from a single control plane. What the company has introduced is a realization of its cloud vision for storage—only it’s sitting on your premises.
As new workloads and applications are introduced into the environment, Pure’s implementation of zero-move tiering will be extremely helpful in improving resource utilization and efficiency. So, what is zero-move tiering? It’s better to start with what tiering is.
Storage tiering is a way of prioritizing your storage, data and applications. Hence, the most mission-critical applications have access to the fastest storage, and less critical applications access the appropriately performant storage. For example:
Tiering can vary from organization to organization, but the concept is the same: get more important workloads and applications connected to the fastest and best storage. In the past, IT required a lot of work to do this. With zero-move tiering, that work disappears.
Thanks to the single-layer architecture and global storage pool, all data is already together. In other words, there are no data store tiers. In this case, Pure’s FlashBlade product intelligently prioritizes mission-critical workloads (and data) for processing, and no data moves from one storage class to another. Instead, the compute and networking resources dictate the tiering class.
To make this a little easier to deploy and manage, Pure has extended its AI copilot (announced at Accelerate in Las Vegas) to manage file services. This goes more directly to the earlier point about the modern IT organization consisting of a lot of smart people and not just specialists. With Pure’s AI copilot, IT folks can manage their Pure storage environment through natural language, not strange semantics. I am a fan of the copilot concept in general and how Pure has developed its own. It makes everybody a specialist and can turn specialists into experts through prompt-level engineering.
Pure also announced the availability of a VM assessment tool to help admins better manage their virtualized environments. Virtualized environments have forever promised to drive up utilization and overall datacenter efficiency. For many organizations, the reality is far different. Too many virtual machines run on servers that are not even close to being utilized to their full extent. This tool, when available, will be a good way for organizations to become more efficient.
Given the recent VMware turbulence, this could be a great help for organizations in the midst of figuring out their go-forward strategy. Not necessarily for moving away from VMware’s VCF offering, but certainly for rationalizing licensing and deployments.
Finally, Pure has introduced Universal Credits to the market. Here’s the scenario: I oversubscribe to one service and undersubscribe to another as an IT organization. This happens all the time. In one scenario, I’ve got to shake the couch cushions to find a budget. In the other, I’m throwing money out the window. With this service, I can use my credits across the Pure portfolio—Evergreen//One, Pure Cloud Block and Portworx. Further, if I end my subscription term and have extra credits, I can carry those credits forward (with some conditions). This is pretty cool.
Here’s what I would like to see at some point. For some organizations, IT budget centers come from different funding buckets and are managed separately. A good example of this is when I was an IT executive in state government. There were 39 or so agencies with 39 or so IT budgets. What would be great is if I could share my credits with a sister agency to leverage Pure’s services even better. But hey, I’m just wishful thinking.
At the bottom of every Pure PowerPoint deck is “Uncomplicate Data Storage, Forever.” From my perspective, this is exactly what the company is doing in every release of updates and services across its portfolio: making life easier for IT. While the majority of my words here have described Pure’s Real-time Enterprise File solution, it’s the combination of all of these services (plus the launch of the entry-level FlashBlade//S100) that delivers a lot of value to IT across operations, organization and finances.
There is a reason why Pure’s revenue was up significantly year over year while others (apart from NetApp) saw down quarters in their storage portfolio. And that reason is simple: IT wants its storage consumption to be like its cloud consumption—frictionless and easy. Further, it wants to do so with the promise of cloud-style economics.
It is fair to say that Pure’s strategy is spot-on, and its message is landing with the market. The only question is, what’s next?
The post Pure Storage Keeps Removing Complications From Enterprise Data Storage appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 4, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending October 4, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 27, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 27, 2024 appeared first on Moor Insights & Strategy.
]]>The post Datacenter Podcast: Episode 30- Talking Infoblox, PensionDanmark, Intel, HPE, Google, Pure Storage appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
2:07 Did Infoblox Crack The Code On Hybrid Multi-Cloud Management?
8:08 Qubits For Kroners
15:15 Intel Makes A Statement In The Datacenter
26:29 HPE Super Sizes AI With Aruba Central Updates
33:10 CAPTCHA If You Can
37:16 Making Storage Simple 101
46:49 The Top 3 List – Getting To Know Us
Did Infoblox Crack The Code On Hybrid Multi-Cloud Management?
https://x.com/WillTownTech/status/1839033352797495571
Qubits For Kroners
Intel Makes A Statement In The Datacenter
HPE Super Sizes AI With Aruba Central Updates
https://x.com/WillTownTech/status/1838629239857320341
CAPTCHA If You Can
https://tik-db.ee.ethz.ch/file/7243c3cde307162630a448e809054d25/
Making Storage Simple 101
https://www.networkworld.com/article/3538618/pure-storage-brings-storage-as-a-service-to-files.html
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Datacenter Podcast: Episode 30- Talking Infoblox, PensionDanmark, Intel, HPE, Google, Pure Storage appeared first on Moor Insights & Strategy.
]]>The post Oracle Cloud Infrastructure And AWS Form Strategic Partnership appeared first on Moor Insights & Strategy.
]]>Oracle and AWS have entered into a strategic relationship, announced this week at the Oracle CloudWorld conference in Las Vegas, in which Oracle’s cloud infrastructure will be deployed and run in AWS datacenters. This partnership, modeled after Oracle’s existing relationships with Microsoft Azure and Google Cloud or GCP, will see Oracle Autonomous Database and Exadata infrastructure physically reside in and integrate with the entirety of the AWS portfolio of technologies and services.
This announcement is significant for enterprise IT organizations that consume both Oracle and AWS services—meaning virtually every large enterprise. However, it may be even bigger for the industry as whole because it indicates a move toward native multicloud integration to better meet customers’ needs. Let’s dig into why this partnership between OCI and AWS is such a big deal for customers and the industry.
We live in a multicloud world. This is so obvious—almost a cliché—that it is easy to lose sight of what this actually means. Unless, of course, you happen to be an IT pro responsible for connecting applications and data for the business, or an application developer tasked with building a cloud app fueled by data that resides everywhere.
In many ways, however, the multicloud that we’ve seen to date has meant nothing more than consuming services from multiple cloud providers. But shouldn’t it also mean cloud-to-cloud connectivity that is performant, secure and frictionless? Unfortunately, that really hasn’t been the case in practical terms. More than that, the cost of moving data from cloud to cloud can be prohibitive. In some cases, even moving data from region to region—within the same cloud!—can become prohibitively expensive.
Some CSPs have addressed this through dedicated interconnects. In the case of OCI, Oracle has already developed partnerships with Azure (which I covered here) and Google Cloud (which I wrote about here) to enable low-latency, highly secure connections between the cloud environments. This allows customers to move data from cloud to cloud and from app to database fast and without those dreaded egress costs.
The concept of Oracle Database@CSP took has taken this multi-cloud enablement to new levels. Under this model, Oracle deploys its Exadata infrastructure and Autonomous Database in another CSP’s datacenter. This means that the database is fully connected to the CSP network and natively accessible by the portfolio of services in that datacenter.
In this model, customers buy, consume and manage Oracle database services through the console of the host CSP. It is effectively a first-party service that a consumer can spin up like any other service, so it is very simple. However, the Oracle Cloud team still maintains the Oracle environment.
Over the past few years, OCI has partnered with Azure and GCP to deliver this Database@CSP model (Oracle Database@Google Cloud was just made generally available at the time of this writing). In the case of Azure, we know that the partnership was delivered for enterprise customers that standardized on Oracle and Microsoft many years ago. While the GCP flavor of this was just recently released, I have no doubt this partnership will see similar success. That said, the GCP partnership differs from the Azure one because the GCP customer profile is different. While Azure is very popular with enterprise IT, GCP tends to be more attractive to smaller organizations. GCP also has a rich history in areas of advanced computing such as AI.
The one missing piece of the Database@CSP strategy has been the biggest CSP of all: AWS. While this may seem a little surprising on its surface, it really isn’t. AWS is the largest CSP on the planet by a considerable margin and is pretty strong in its opinions about having third-party infrastructure in its datacenters—especially infrastructure from a competitor, and even more so from a competitor as aggressive as Oracle.
But here’s the deal: the largest CSP and the largest database vendor are sure to have many customers in common. Those customers want to easily and cost-effectively marry AWS’s goodness with all the data in their Oracle environments. To take one example, imagine seamlessly feeding the AWS Bedrock development platform for generative AI with decades of your enterprise data residing in Oracle. This is what customers want, and this is what AWS and Oracle can uniquely deliver—but only through a thoughtfully constructed partnership.
Oracle Database@AWS is precisely what was described previously for Azure and GCP, but tailored to AWS. Oracle’s Autonomous Database and Exadata infrastructure are deployed in AWS data centers and made available for AWS customers to consume just like they would any other AWS service. From selection to billing to monitoring, the Oracle database environment looks like every other AWS service from the customer’s perspective.
Once stood up, Database@AWS also integrates directly with other AWS offerings—as in the Bedrock example already given. Companies (mostly enterprises) that have invested in Oracle for their database needs will find this integration especially compelling, as they will be able to make that data available to AWS services, once again in a highly secure and low-latency environment. If a customer has technical issues with their instance, AWS handles first-level support. If the problem isn’t resolved, AWS and Oracle work together to resolve it.
I believe that enterprise IT organizations will find it compelling to be able to remove the extract-transform-load process when using tools such as AWS Analytics. This kind of streamlining is the very definition of speed and simplicity in our data-driven era. Likewise, the ability to connect AWS Bedrock with all that rich data sitting in the Oracle Database immediately makes GenAI in the enterprise easier, faster and more secure.
It’s important to restate that this setup is not simply Oracle’s database running in AWS as a service. Rather, this is Oracle Cloud Infrastructure residing and running in AWS datacenters, with Autonomous Database and supporting services (networking etc.) along for the ride—a cloud region running inside a cloud region. Like with Azure and GCP, Oracle’s play with AWS is completely differentiated from any other vendor. No other cloud provider deploys a region in another cloud provider’s datacenter.
This is a crucial detail to tease out because it speaks to a couple of things. First and foremost, it delivers guarantees for performance, reliability and resiliency that are aligned with Oracle’s standards. This is not to imply that AWS is a less reliable cloud. However, Exadata and the Autonomous Database infrastructure are designed and tuned specifically for the Oracle Database environment and, as such, deliver better performance than third-party hardware ever could.
The second thing to note is that these OCIs are building plumbing between clouds. Oracle Database@Azure and Oracle Database@GCP are OCI regions. These OCI regions can distribute data among themselves, effectively enabling organizations to move data easily from one cloud to another—with, let me remind you once again, low latency and strong security.
This is a significant win for Oracle for several reasons. First, it allows the company to meet its customers on their own terms. For any enterprise that made AWS its primary CSP years ago and now wants to migrate its Oracle environment, this partnership finally enables it to happen. AWS has tons of enterprise applications and Oracle has tons of enterprise data; as previously mentioned, this move allows customers to bring all that data to all those applications.
This partnership is also important for Oracle because it enables the company to drive toward market expansion of its database platform. Many existing Oracle customers are large enterprises that have been using the platform for decades—very many of them since the 20th century. Some of those cited in the Oracle press release are Vodafone, Fidelity and State Street Bank. While these sizeable organizations are on the leading edge of technology, Oracle is trying to educate and bring a new generation of companies and developers into its community as well. The partnership with AWS (like the existing GCP partnership) should help Oracle drive this market expansion strategy.
If one were to draw a Venn diagram of AWS and Oracle customers, its intersection would be large. Given that Oracle seems to be in virtually every Fortune 1000 company, it is fair to say that AWS’s biggest customers are also overwhelmingly Oracle customers. This partnership enables AWS to better meet the needs of these customers that want to take advantage of an Oracle Autonomous Database but consume it through AWS—and the budget already allotted to AWS.
Not just incidentally, I believe this could also be a good defensive tack for AWS. Azure has established a strong “enterprise cloud” position thanks to Microsoft’s legacy in on-prem enterprise IT environments. This new partnership with Oracle enables AWS to maintain parity with Microsoft from an enterprise serviceability perspective.
Oracle has been quite aggressive with OCI since it launched its Gen 2 back in 2018—and the company has seen considerable success with it. In fact, in its latest earnings, Oracle saw its cloud revenue grow 21% year over year and its IaaS revenue grow a staggering 45% YoY. That is partly tied to the company’s footprint in the enterprise.
Oracle has been building what I call a native multicloud offering for some time. It started with building dedicated interconnects with Azure and Google and has expanded to deploying its cloud within the CSPs to deliver performance, security and value on customers’ terms. This kind of cooperation makes today’s version of Oracle hardly recognizable as the company I used to write checks to when I was in enterprise IT leadership.
How will this all play out? Will Oracle succeed in turning the next generation of app developers and businesses into customers? Will AWS, Azure and Google aggressively position their Oracle offering?
Time will tell. It’s very early in the game, and I expect the first surge of business will come from existing customers migrating Oracle databases to the cloud. The real work begins after that, with Oracle’s outreach efforts wrapped in awareness and education campaigns.
One thing is for certain: Oracle has positioned itself well.
The post Oracle Cloud Infrastructure And AWS Form Strategic Partnership appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 20, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 20, 2024 appeared first on Moor Insights & Strategy.
]]>The post RESEARCH NOTE: Is Lenovo’s AI Strategy Working? appeared first on Moor Insights & Strategy.
]]>Stop me if you’ve heard this one before: AI is top-of-mind for virtually every IT organization. And GenAI is the elixir that will cure all inefficiencies that slow down businesses. (It must be true; I read it on X.) Don’t believe me? Just look at any marketing literature from both old and new companies that have reoriented their positioning to win in this AI gold rush.
Lenovo is one of the many enterprise IT solutions companies chasing the AI pot of gold; fortunately for it and its customers, Lenovo actually has the products and the know-how to deliver practical results. Like its competitors, it is combining hardware, software, and services to deliver differentiated value.
As part of its AI strategy, the company has just announced a number of new offerings to help ease the cost and the operational and complexity challenges presented by AI. Do these strike a chord? Are they relevant? Let’s start by setting the relevant context for what’s going on with enterprise AIOps, then dig into what Lenovo is doing about it and what it means for customers.
Before GenAI can solve all the world’s problems, enterprise IT first has to figure out how to deploy, power, manage, and pay for the hardware and software stacks that make GenAI’s magic happen. I didn’t read this on X—I’ve heard it from every IT executive I’ve spoken with on the topic.
As I touched on above, the challenges of GenAI span three buckets: financial, operational, and organizational. In other words, it’s costly, it’s complex, and it requires a lot of people. From planning to deploying to using and managing, there is not much about GenAI that adheres to traditional IT practices.
Because of this, organizations struggle to activate GenAI in the enterprise. Probably most of my readers here have seen the stats about GenAI projects, but they bear repeating. For example, recent RAND National Security Research Division study calls out AI project abandonment rates as high as 80%. While I believe this number is on the very high side, the spirit of RAND’s message still resonates. Organizations tend to treat AI projects like other IT projects, then quickly realize they are anything but ordinary because of their costs, complexity, power consumption, people needs, and other factors.
Naturally, IT solutions companies have focused on removing some of these barriers by introducing integrated stacks, partnerships, services, and the like. As evidence of this, NVIDIA CEO Jensen Huang seems to have been on stage for every major tech conference in 2024. Additionally, we’ve seen the introduction of cool-sounding names that promote server vendors’ solutions to the market. Yet after all the hype and cool names, the challenges still remain. GPUs are prohibitively expensive and consume all of the available power in the rack and the datacenter; solution stacks—once operational—are now hard to manage; a huge skills gap exists; and so on. This is how the market gets to 80% abandonment rates, quickly descending from inflated expectations to the depths of disillusionment.
This is where Lenovo comes in. In its latest announcement, Lenovo attempts to address some of these challenges with a few subtly impactful product and service announcements. The first is for the company’s GPU-as-a-service (GPUaaS) offering, which allows customers to better leverage expensive GPUs across the enterprise.
Let’s say you are a state government IT executive with dozens of agencies that operate as separate shops—individual teams, individual budgets, etc. The state CIO, on a directive from the governor, makes implementing AI a top priority for every agency. GPUaaS allows all of these agencies to leverage the same farm of GPUs, with usage metering and billback built in, via Lenovo Intelligent Computing Orchestration (LiCO). Organization-wide costs come down, and each agency has the necessary horsepower to train and tune its AI models.
As somebody who has lived in this world—I have been that state government IT exec—I can immediately see the benefits of GPUaaS. While there are still challenges around how budgets and cross-agency utilization are prioritized and managed, this solution can deliver real value to organizations standing up AI in their datacenters. More than that, GPUaaS addresses all three of the big challenges facing IT mentioned earlier—cost, ops, organization.
Lenovo’s second announcement, about AIOps, goes right to the heart by directly addressing the operational and complexity challenges of enterprise IT. (Cost is more of an indirect benefit.) The substance of it is that Lenovo’s XClarity One hybrid cloud management platform will incorporate predictive analytics and GenAI to deliver greater levels of reliability and cyber resilience for Lenovo infrastructure.
AIOps is an IT trend that has been around for some time. While Lenovo’s move is somewhat of a catch-up play, it does allow the company to check the box for an element of enterprise IT readiness that is critical for achieving broad adoption in this segment. Further, while much of the competition’s capabilities in this area have come via acquisition, Lenovo’s XClarity One is the fruit of in-house design.
As a techie who grew up in the server/network management space (Want to talk about managing Novell NLMs and why IPX is better than IP? I’m your guy), I like what Lenovo has done with XClarity One. In fact, I wish the company would lean into this goodness more. For instance, the cloud-based nature of XClarity One makes it simple to deploy and consume. Further, this model enables IT organizations to manage their Lenovo infrastructure through the proverbial single pane of glass.
Finally, Lenovo has built on its big HPC and AI winner by announcing some slight enhancements to its Neptune liquid-cooling technology. Specifically, Lenovo reported that Neptune now has built-in real-time energy efficiency monitoring. This enables organizations to better understand how efficiently their infrastructure is operating, allowing for proactive tweaks and tuning to drive down the all-important power usage efficiency (PUE) rating.
Frankly, Lenovo’s challenge is not whether it has a legitimate and differentiated play in enterprise IT in general and AI in particular. In both cases, the answer is a simple yes. The real challenge is telling Lenovo’s story to the enterprise.
The company has done an excellent job of building a business that dominates in the hyperscale and HPC markets. These are two highly competitive markets in terms of performance and resilience/reliability. For whatever reason, though, the company has seemed a little hesitant about aggressively pursuing the commercial enterprise market. Strangely, Lenovo’s business in this area is largely the same business that was run for decades by IBM—perhaps the all-time most trusted brand in enterprise IT.
Lenovo was ahead of the market in building and enabling the AI ecosystem (Check out the company’s AI innovator program). Further, its infrastructure is deployed for brands and retailers that are used and visited by most people on a daily business. Yet despite all of this innovation, most IT professionals don’t know just how rich of a portfolio the company has.
Given Lenovo’s new leadership, I expect that will change. If the company leverages all of the pieces of its technical and business portfolio, it will be a force to reckon with.
The post RESEARCH NOTE: Is Lenovo’s AI Strategy Working? appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 13, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 13, 2024 appeared first on Moor Insights & Strategy.
]]>The post VMware Explore Brings Broadcom’s Private Cloud Strategy Into Focus appeared first on Moor Insights & Strategy.
]]>Since Broadcom closed its acquisition of VMware in November 2023, there has been much noise from industry pundits and the press. Some of the noise focused on licensing and pricing changes. Some of it concentrated on changes to the legacy VMware channel program. Some of it was around portfolio consolidation strategies and the potential end-of-life of certain products. And some of it was general FUD instigated by competitors who saw an opportunity to capitalize on the situation.
Mind you, some of this noise was undoubtedly warranted. Or maybe “warranted” is too strong, and “understandable” is better. Yes, there was a lot of change—a lot of disruption. And in fairness, the company was not exactly crisp in its messaging around these changes.
Lost in all this, however, was a vision that Broadcom laid out regarding VMware Cloud Foundation and how the company wanted to transform it into a platform that would enable customers to achieve a true cloud operating model—meaning a single stack and single control plane that customers could use to achieve the cloud on-premises.
The VMware Explore 2024 event this week marked the first opportunity Broadcom had to speak directly with its customers and dig deeper into its strategy. Granted a few days of our collective attention, what did the company finally have to say? Did Broadcom lay out a compelling and differentiated story? Have questions been answered? Let’s dig in.
Before getting into Broadcom’s announcements, let’s peel back the layers of the onion on this private cloud thing. When many people think of “private cloud,” they hearken back to circa 2009 and a term given to what amounts to a VM cluster that is walled off, with maybe some base self-service capability. It seems that this term quickly fell out of fashion, giving way to “hybrid” and, eventually, “hybrid multi-cloud.” However, as enterprise organizations keep using the public cloud for some functions, the needs that drove the concept of the private cloud in the first place persist. This means that the apps and data—and the environment that runs them—that need to be on-premises . . . really must be on-premises.
But there is a tension here, because the legitimate privacy and security needs of the business can conflict with what works best for IT. Developers, DevOps, data science folks and others want to quickly spin up the compute, storage and networking required for the tasks at hand. They also want to do this complete with curated services for security, load balancing, AI and so on so they can do their jobs faster and easier. All of these needs have traditionally been served by the public cloud, albeit at a huge cost. In private cloud’s original form, the most advanced implementations almost managed to satisfy both the business and the technical requirements—but not quite.
This dynamic has partially fueled the continued growth of cloud service providers including AWS, Microsoft Azure, Google, and Oracle. While managing a company’s public cloud estate is both very complex and very costly, IT organizations have traditionally endured those pains to enable business agility.
Against this backdrop, Broadcom introduced VCF 9—the full-stack, multi-tenant private cloud that can be run anywhere: on-premises, in a colocation facility or even on a public cloud. Yes, an IT organization can take its entire VCF stack and move it from on-prem to the public cloud and back.
We could think of VCF 9 as Private Cloud 3.0. Meaning, it is effectively the public cloud brought on-premises through the integration of technology that existed across the VMware portfolio. It is not simply a bunch of virtual machines or siloed environments managed by different teams. It is infrastructure provisioned for multi-tenancy and consumed through a cloud portal. It’s also a curated (and growing) list of services that address the enterprise’s most common set of needs.
In many ways, VCF 9 is a radical departure from what the legacy VMware portfolio has delivered to the market. But in another sense, it’s not so different. Most of the pieces of this puzzle have been in the VMware portfolio; the creation of VCF 9 was more of an exercise of bringing it all together coherently. The great technology the company has been developing over the years is now integrated into a single stack with a single control plane to deliver the cloud as described above.
Moving to VCF 9 will not be easy for enterprise IT organizations. It is a full cloud migration—just not to a public provider like AWS or Azure. However, Broadcom has created a set of services to ease this migration and help IT organizations build and maintain the skills to support this new environment.
I’m a fan of what Broadcom is doing with these services for several reasons. First and foremost, it creates an opportunity for its partners to add real value to the equation—not simply managing licenses or volume agreements, but playing an important role in what is arguably the largest IT transformation project many organizations will experience.
Second, this approach creates stickiness for Broadcom with its customers. Customers may get VCF 9 as part of their VMware license, but deploying and using it builds an entirely new dynamic between customers and Broadcom. Effectively, Broadcom is commercializing the cloud and becoming that provider to the enterprise.
I speak with IT practitioners and executives regularly. I also used to run a couple of IT shops before I became an analyst. Remove the term “private cloud” and the perceptions that folks have of it, and I believe that VCF 9 is precisely what customers want.
This is not to say that IT organizations are looking to abandon the public cloud (although Broadcom CEO Hock Tan flashed a slide during his keynote indicating that upwards of 83% of enterprise IT organizations are looking to repatriate some applications). Rather, it’s to say that while the public cloud has its place and utility, an enterprise organization’s on-prem datacenter—its data estate—needs to be consumed in the way that organizations have grown accustomed to with the public cloud. However, these enterprises also have to account for reducing the cost and complexity of the public cloud. Tan, at one point, referenced the “public cloud PTSD” suffered by many enterprise IT organizations.
So, do I believe customers want VCF 9? Yes. Do I believe customers realize they want VCF 9? Not yet, but I suspect Broadcom’s go-to-market team is going to resolve this.
When talking with IT folks regarding VMware and potential moves to other alternatives, Nutanix and Red Hat tend to be the two vendors most mentioned. Nutanix Cloud Platform and Red Hat OpenStack tend to be the products that are in the discussion.
There are similarities and differences between VCF and these competitors. We can generally lump all three into the cloud operating model that IT organizations hope to achieve. NCP is the solution I hear referenced more as companies discuss exploring alternatives. While Nutanix has done an excellent job leveraging partnerships with OEMs, I haven’t yet seen NCP land in large enterprise accounts. I am curious how the company’s partnership with Dell will make Nutanix AHV available on PowerFlex storage. This external storage support is critical to achieving market traction.
When looking at Red Hat, I have not seen the same level of interest as I have regarding Nutanix. This solution, a Red Hat commercialized version of open source projects, faces a similar challenge to the legacy VMware solutions that enterprise organizations face—it’s a cobbling-together of multiple solutions to get customers part of the way to achieving cloud on-prem. While Red Hat’s RHV, OpenStack and OpenShift solutions can be good for customers who want more customization, that flexibility has a cost: complexity.
There has been much noise surrounding Broadcom and VMware since November of 2023. The Explore conference this week was a pivotal event for the company, because it was important for Hock Tan and the team to demonstrate to a skeptical market that the company is focused on delivering value to its customers.
Did Broadcom succeed? Yes. Whether one agrees or disagrees with Broadcom’s vision of private cloud is irrelevant. The company has built a compelling vision that helps enterprise organizations reduce complexity and cost by creating their own cloud that can run anywhere.
I suspect the company will see attrition among its smaller customers who cannot realize the full value of VCF or VMware vSphere Foundation. However, given the changes to the VMware portfolio, this is to be expected.
I’ll be following Broadcom’s progress with VCF 9 closely, looking for actual deployments and consumption as the true indicator of its market success. Stay tuned.
The post VMware Explore Brings Broadcom’s Private Cloud Strategy Into Focus appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 6, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending September 6, 2024 appeared first on Moor Insights & Strategy.
]]>The post Datacenter Podcast: Episode 29 – We’re Talking Zscaler, AI, Broadcom, HPE, xAI, Dell appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
3:03 Zscaler’s Strong Earnings Don’t Land With Bubble Bears
8:55 AI Can Read Your Tongue
15:23 Broadcom Goes Back To The Future With VCF 9
26:46 HPE Q3FY24 Earnings Are A Tale Of Two Portfolios
32:09 World’s Most Powerful AI Training System
36:36 Server Vendors Had A Banner Quarter
44:02 Getting To Know The Team
Zscaler’s Strong Earnings Don’t Land With Bubble Bears
https://x.com/WillTownTech/status/1831782382854357010
AI Can Read Your Tongue
HPE Q3FY24 Earnings Are A Tale Of Two Portfolios
https://x.com/WillTownTech/status/1831785559141835020
World’s Most Powerful AI Training System
https://x.com/elonmusk/status/1830650370336473253
Server Vendors Had A Banner Quarter
https://www.linkedin.com/feed/update/urn:li:activity:7235270767356100608/
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Datacenter Podcast: Episode 29 – We’re Talking Zscaler, AI, Broadcom, HPE, xAI, Dell appeared first on Moor Insights & Strategy.
]]>The post RESEARCH NOTE: Looking at AI Benchmarking from MLCommons appeared first on Moor Insights & Strategy.
]]>Although several AI benchmarking organizations exist, MLCommons has quickly become the body that has gained the most mindshare. Its MLPerf benchmark suite covers AI training, various inference scenarios, storage, and HPC.
The organization recently released MLPerf Inference v4.1, which examines inference performance for several AI accelerators targeting datacenter and edge computing. In this research note, I attempt to give more context to the results and discuss what I consider some interesting findings.
Generative AI is a magical and mystical workload for many IT organizations that instinctively know there’s value in it, but aren’t entirely clear what that value is or where it applies across an organization. Yes, more traditional discriminative AI uses, such as computer vision, can deliver direct benefits in specific deployments. However, GenAI can have far broader applicability across an organization, though those use cases and deployment models are sometimes not as obvious.
Just as AI is known yet unfamiliar to many organizations, learning what comprises the right AI computing environment is even more confusing for many of them. If I train, tune, and use, let’s say, Llama 3.1 across my organization for multiple purposes, how do I know what that operating environment looks like? What is the best accelerator for training? What about when I integrate this trained model into my workflows and business applications? Are all inference accelerators pretty much the same? If I train on, say, NVIDIA GPUs, do I also need to deploy NVIDIA chips for inference?
Enterprise IT and business units grapple with these and about 82 other questions as they start to plan their AI projects. The answer to each question is highly dependent on a number of factors, including (but not limited to) performance requirements, deployment scenarios, cost, and power.
If you listen to the players in the market, you will quickly realize that each vendor—AMD, Cerebras, Intel, NVIDIA, and others—is the absolute best platform for training and inference. Regardless of your requirements, each of these vendors claims supremacy. Further, each vendor will happily supply its own performance numbers to show just how apparent its supremacy is.
And this is why benchmarking exists. MLCommons and others make an honest attempt to provide an unbiased view of AI across the lifecycle. And they do so across different deployment types and performance metrics.
MLPerf Inference v4.1 takes a unique approach to inference benchmarking in an attempt to be more representative of the diverse use of AI across the enterprise. AI has many uses, from developers writing code to business analysts tasked with forecasting to sales and support organizations providing customer service. Because of this, many organizations employ mixture of expert (MoE) models. An MoE essentially consists of multiple, smaller, gated expert models that are invoked as necessary. So, if natural language processing is required, the gate activates the NLP expert. Likewise for anomaly detection, computer vision, etc.
In addition to its traditional testing of different inference scenarios, the MLPerf team selected Mistral’s Mixtral 8x7B as its MoE model for use in v4.1. This enables testing that demonstrates the broader applicability of inference across the enterprise. In Mixtral, the MLPerf team chose to test against three tasks in particular: Q&A (powered by the Open Orca dataset), math reasoning (powered by the GSM8K dataset), and coding (powered by the MBXP dataset).
As seen in the table below, MLPerf Inference v4.1 looks at inferencing scenarios that span uses across the enterprise, with tests that show variances for latency and accuracy.
There are a couple of other things worth mentioning related to MLPerf that I believe show why it’s a credible benchmark. First, all results are reviewed by a committee, which includes other submitters. For example, when AMD submits testing results for its MI300, NVIDIA can review and raise objections (if applicable). Likewise, when NVIDIA submits its results, other contributing companies can review and object as they see fit.
Additionally, chip vendors can only submit silicon that is either released or will be generally available within six months of submission. This leads to results that are more grounded in reality—either what’s already on the truck or what will be on the truck shortly.
For this benchmark, AMD, Google, Intel, NVIDIA, and UntetherAI were the chips evaluated by 22 contributors from server vendors, cloud providers, and other platform companies. Chips from Qualcomm, Cerebras, Groq, and AWS were surprisingly absent from the sample. It is also important to note that while Intel submitted its “Granite Rapids” Xeon 6 chip for testing, its Gaudi accelerator was not submitted.
There are many reasons why an organization might not submit. It could be resource constraints, cost, or a number of other reasons. The point is, we shouldn’t read too much into a company’s choice to not submit—other than that there’s no comparative performance measurement for the chips that weren’t submitted.
One final consideration when reviewing, if you choose to review the results on your own: not every test was run on every system. For instance, when looking at the inference datacenter results, NeuralMagic submitted results for the NVIDIA L40S running in the Crusoe Cloud for Llama 2-70B (Q&A). This was the only test (out of 14) run. So, use the table above to decide what kind of testing you would like to review (image recognition, language processing, medical imaging, etc.) and the configuration you’d like (number of accelerators, processor type, etc.) to be sure you are looking at relevant results. Otherwise, the numbers will have no meaning.
If appropriately used, MLPerf Inference v4.1 can be quite telling. However, it would likely be unfair for me to summarize the results based on what I’ve reviewed. Why? Precisely because there are so many different scenarios by which we can measure which chip is “best” in terms of performance. Raw performance versus cost versus power consumption are just a few of the factors.
I strongly recommend visiting the MLCommons site and reviewing your inference benchmark of choice (datacenter versus edge). Further, take advantage of the Tableau option at the bottom of each results table to create a filter that displays what is relevant to you. Otherwise, the data becomes overwhelming.
While it is impossible to provide a detailed analysis of all 14 tests in datacenter inference and all six tests in edge inference, I can give some quick thoughts on both. On the datacenter front, NVIDIA appears to dominate. When looking at the eight H200 accelerators versus eight AMD MI300X accelerators in an offline scenario, the tokens/second for Llama 2-70B (the only test submitted for the MI300X) showed a sizable advantage for NVIDIA (34,864 tokens/second versus 24,109 tokens/second). Bear in mind that this comparison does not account for performance per dollar or performance per watt—this is simply a raw performance comparison.
When looking at NVIDIA’s B200 (in preview), the performance delta is even more significant, with offline performance coming in at 11,264 tokens/second versus 3,062 tokens/second for the MI300X. Interestingly, this performance advantage is realized despite the B200 shipping with less high bandwidth memory (HBM).
When looking at inference on the edge, UntetherAI’s speedAI240 is worth considering. The company submitted test results for resnet (vision/image recognition), and its numbers relative to the NVIDIA L40S are stunning in terms of latency, with the speedAI 240 coming in at .12ms and the L40S coming in at .33ms for a single stream. It’s worth noting that the speedAI 240 has a TDP of 75 watts, and the L40S has a TDP of 350 watts.
The work of the MLCommons team yields many more interesting results, which are certainly worth investigating if you are scoping an AI project. One thing I would recommend is using the published results, along with published power and pricing estimates (neither NVIDIA nor AMD publish pricing), to determine the best fit for your organization.
I’ve been in the IT industry longer than I care to admit. AI is undoubtedly the most complex IT initiative I’ve seen, as it is a combination of so many unknowns. One of the toughest challenges is choosing the right hardware platforms to deploy. This is especially true today, when power and budget constraints place hard limits on what can and can’t be done.
MLCommons and the MLPerf benchmarks provide a good starting point for IT organizations to determine which building blocks are best for their specific needs because they allow comparison of performance in different deployment scenarios across several workloads.
MLPerf Inference v4.1 is eye-opening because it shows what the post-training world requires, along with some of the more compelling solutions in the market to meet those requirements. While I expected NVIDIA to do quite well (which it did), AMD had a strong showing in the datacenter, and UntetherAI absolutely crushed on the edge.
Keep an eye out for the next training and inference testing round in the next six months or so. I’ll be sure to add my two cents.
The post RESEARCH NOTE: Looking at AI Benchmarking from MLCommons appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 30, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 30, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 23, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 23, 2024 appeared first on Moor Insights & Strategy.
]]>The post Ep. 28: MI&S Datacenter Podcast: Talking Cisco, IBM, Dell & Nutanix, Black Hat USA 2024, AI, HPE appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
2:42 Cisco 4Q Earnings & Leadership Shake-up
11:23 What Is An ML-KEM?
18:24 Dell & Nutanix Get Serious-er
25:03 Black Hat USA 2024 Insights
30:46 Professor Bot
36:44 Morpheus – The God Of Dreams – & HPE’s Latest Acquisition
Cisco 4Q Earnings & Leadership Shake-up
https://x.com/WillTownTech/status/1824156801711083707
What Is An ML-KEM?
https://research.ibm.com/blog/nist-pqc-standards
Dell & Nutanix Get Serious-er
https://www.linkedin.com/feed/update/urn:li:activity:7229484227182911488/
Black Hat USA 2024 Insights
https://x.com/WillTownTech/status/1824121727456026696
Professor Bot
https://sakana.ai/ai-scientist/%C3%82%C2%A0
https://arxiv.org/pdf/2408.06292
Morpheus – The God Of Dreams – & HPE’s Latest Acquisition
https://www.linkedin.com/feed/update/urn:li:activity:7229842227567411200/
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Ep. 28: MI&S Datacenter Podcast: Talking Cisco, IBM, Dell & Nutanix, Black Hat USA 2024, AI, HPE appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 16, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 16, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 9, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 9, 2024 appeared first on Moor Insights & Strategy.
]]>The post RESEARCH PAPER: Digital Transformation Starts with a Digital Experience Platform appeared first on Moor Insights & Strategy.
]]>Digital transformation is a term that has existed for some time. It is also a practice (and trend) that is evergreen. In fact, technology has been used to drive better business outcomes for decades. What’s new is the focus on data feeding artificial intelligence (AI) models and analytics engines as key enablers of automated business processes.
The most recent wave of digital transformation has seen a second trend that has caused many organizations to reconsider efforts — generative AI (GAI). The use of foundational models and large language models (LLMs) to drive all facets of business operations has become essential. As a result, many organizations have rescoped transformation efforts to optimize deployments.
With such a focus on data-driven outcomes, the expectations across an organization are understandably high. Faster, better, and higher quality are not just platitudes; they are key metrics that determine success, regardless of whether an organization delivers a new product to the market or provides public services.
Indeed, digital transformation is the monetization of data.
The challenges many enterprises face when undergoing digital transformation can be mapped across four vectors — culture (people), operational (processes, procedures), technology, and data. Each vector is a critical element to the success of any transformational effort.
This research brief explores the tensions organizations face across these success factors while driving toward an AI-enabled, digitally transformed state. Further, this paper introduces the Iron Mountain InSight Digital Experience Platform (DXP) and explains how this SaaS-based platform is critical to the digital transformation process.
You can download the paper by clicking on the logo below:
Table of Contents
Companies Cited:
The post RESEARCH PAPER: Digital Transformation Starts with a Digital Experience Platform appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 2, 2024 appeared first on Moor Insights & Strategy.
]]>The post MI&S Weekly Analyst Insights — Week Ending August 2, 2024 appeared first on Moor Insights & Strategy.
]]>The post Ep.27 of the MI&S Datacenter Podcast: Talking CrowdStrike, AI, AMD, HPE & Juniper, Quantinuum, Arm appeared first on Moor Insights & Strategy.
]]>Watch the video here:
Listen to the audio here:
3:00 CrowdStrike IT Outage Post Mortem
11:37 AI Immunity Warriors
18:07 AMD Crushes The Datacenter With EPYC and MI300
26:11 HPE Achieves EU Unconditional Regulatory Approval For Juniper Acquisition
33:37 A New Quantum Toolbox
37:49 The Secret Weapon Of The Datacenter
CrowdStrike IT Outage Post Mortem
https://x.com/WillTownTech/status/1818628352749580549
AI Immunity Warriors
AMD Crushes The Datacenter With EPYC and MI300
https://www.linkedin.com/feed/update/urn:li:activity:7224395591462612992/
HPE Achieves EU Unconditional Regulatory Approval For Juniper Acquisition
https://www.networkworld.com/article/3480325/eu-clears-hpes-14-billion-juniper-acquisition.html
A New Quantum Toolbox
The Secret Weapon Of The Datacenter
https://moorinsightsstrategy.com/research-notes/is-arm-neoverse-the-datacenters-secret-weapon/
Disclaimer: This show is for information and entertainment purposes only. While we will discuss publicly traded companies on this show. The contents of this show should not be taken as investment advice.
The post Ep.27 of the MI&S Datacenter Podcast: Talking CrowdStrike, AI, AMD, HPE & Juniper, Quantinuum, Arm appeared first on Moor Insights & Strategy.
]]>The post Mistral NeMo: Analyzing Nvidia’s Broad Model Support appeared first on Moor Insights & Strategy.
]]>The promise of AI in the enterprise is huge—as in, unprecedentedly huge. The speed at which a company can get from concept to value with AI is unmatched. This is why, despite its perceived costs and complexity, AI and especially generative AI are a top priority for virtually every organization. It’s also why the market has witnessed AI companies emerge from everywhere in an attempt to deliver easy AI solutions that can meet the needs of businesses, both large and small, in their efforts to fully maximize AI’s potential.
In this spirit of operationalizing AI, tech giant Nvidia has focused on delivering an end-to-end experience by addressing this potential along with the vectors of cost, complexity and time to implementation. For obvious reasons, Nvidia is thought of as a semiconductor company, but in this context it’s important to understand that its dominant position in AI also relies on its deep expertise in the software needed to implement AI. This is why Nvidia NeMo is the company’s response to these challenges; it’s a platform that enables developers to quickly bring data and large language models together and into the enterprise.
As part of enabling the AI ecosystem, Nvidia has just announced a partnership with Mistral AI, a popular LLM provider, to introduce the Mistral NeMo language model. What is this partnership, and how does it benefit enterprise IT? I’ll unpack these questions and more in this article.
As part of the Nvidia-Mistral partnership, the companies worked together to train and deliver Mistral NeMo, a 12-billion-parameter language model in an FP-8 data format for accuracy, performance and portability. This low-precision format is extremely useful in that it enables Mistral NeMo to fit into the memory of an Nvidia GPU. Further, this FP-8 format is critical to using the Mistral NeMo language model across various use cases in the enterprise.
Mistral NeMo features a 128,000-token context length, which enables a greater level of coherency, contextualization and accuracy. Consider a chatbot that provides online service. The 128,000-token length enables a longer, more complete interaction between customer and company. Or imagine an in-house security application that manages access to application data based on a user’s privileged access control. Mistral NeMo’s context length enables the complete dataset to be displayed in an automated and complete fashion.
The 12-billion-parameter size is worth noting as it speaks to something critical to many IT organizations: data locality. While enterprise organizations require the power of AI and GenAI to drive business operations, several considerations including cost, performance, risk and regulatory constraints prevent them from doing this on the cloud. These considerations are why most enterprise data sits on-premises even decades after the cloud has been embraced.
Many organizations prefer a deployment scenario that involves training a model with company data and then inferencing across the enterprise. Mistral NeMo’s size enables this without substantial infrastructure costs (a 12-billion-parameter model can run efficiently on a laptop). Combined with its FP-8 format, this model size enables Mistral NeMo to run anywhere in the enterprise—from an access control point to along the edge. I believe this portability and scalability will make the model quite attractive to many organizations.
Mistral NeMo was trained on the Nvidia DGX Cloud AI platform, utilizing Megatron-LM running 3,072 of Nvidia’s H100 80GB Tensor Core GPUs. Megatron-LM, part of the NeMo platform, is an advanced model parallelism technique designed for scaling large language models. It effectively reduces training times by splitting computations across GPUs. In addition to speeding up training times, Megatron-LM trains models for performance, accuracy and scalability. This is important when considering the broad use of this LLM within an organization in terms of function, language and deployment model.
When it comes to AI, the real value is realized in inferencing—in other words, where AI is operationalized in the business. This could be through a chatbot that can seamlessly and accurately support customers from around the globe in real time. Or it could be through a security mechanism that understands a healthcare worker’s privileged access level and allows them to see only the patient data that is relevant to their function.
In response, Mistral NeMo has been curated to deliver enterprise readiness completely, more easily and more quickly. The Mistral and Nvidia teams utilized Nvidia TensorRT-LLM to optimize Mistral NeMo for real-time inferencing and thus ensure the absolute best performance.
While it may seem obvious, the collaborative focus on ensuring the best, most scalable performance across any deployment scenario speaks to the understanding both companies seem to have around enterprise deployments. Meaning, it is understood that Mistral NeMo will be deployed across servers, workstations, edge devices and even client devices to leverage AI fully. In any AI deployment like this, models tuned with company data have to meet stringent requirements around scalable performance. And this is precisely what Mistral NeMo does. In line with this, Mistral NeMo is packaged as an Nvidia NIM inference microservice, which makes it straightforward to deploy AI models on any Nvidia-accelerated computing platform.
I started this analysis by noting the enterprise AI challenges of cost and complexity. Security is also an ever-present challenge for enterprises, and AI can create another attack vector that organizations must defend. With these noted, I see some obvious benefits that Mistral NeMo and NeMo as a framework can deliver for organizations.
As an ex-IT executive, I understand the challenge of adopting new technologies or aligning with technology trends. It is costly and complex and usually exposes a skills gap within an organization. As an analyst who speaks with many former colleagues and clients on a daily basis, I believe that AI is perhaps the biggest technology challenge enterprise IT organizations have ever faced.
Nvidia continues to build its AI support with partnerships like the one with Mistral by making AI frictionless for any organization, whether it’s a large government agency or a tiny start-up looking to create differentiated solutions. This is demonstrated by what the company has done in terms of enabling the AI ecosystem, from hardware to tools to frameworks to software.
The collaboration between Nvidia and Mistral AI is significant. Mistral NeMo can become a critical element of an enterprise’s AI strategy because of its scalability, cost and ease of integration into the enterprise workflows and applications that are critical for transformation.
While I expect this partnership to deliver real value to organizations of all sizes, I’ll especially keep an eye on the adoption of Mistral NeMo across the small-enterprise market segment, where I believe the AI opportunity and challenge is perhaps the greatest.
The post Mistral NeMo: Analyzing Nvidia’s Broad Model Support appeared first on Moor Insights & Strategy.
]]>