RESEARCH NOTE: Addressing The Memory Optimization Challenge

By Matt Kimball, Patrick Moorhead - July 16, 2024

Data has transformed the modern business landscape in a major way. At every point and with every interaction, data fuels AI models and analytics engines, with the goal of driving organizations to operate faster and more efficiently.

However, this unprecedented data deluge brings unique challenges. The primary challenge is the operational tension between doing more and speeding up time to value versus the ever-present mandate to reduce costs. This tension amplifies the limitations inherent within a server’s memory architecture, a fundamental component of turning data into action. Even with advancements in memory technology, optimizing through compression (cost) can introduce unacceptable application latency, which, ironically, can increase the total cost of ownership.

In the next few sections, I’ll explore the memory challenge and how companies including ZeroPoint Technologies are bridging this memory optimization/performance tension in a way that can meaningfully impact total cost of ownership (TCO). For a detailed analysis of how ZeroPoint delivers on performance and TCO, please read the research brief.

The Memory Challenge

To best appreciate the technology that companies like ZeroPoint bring to bear, it’s essential to understand the challenges in server memory architecture. Data-driven workloads, especially those rooted in high-performance computing (HPC), analytics, and AI, are constrained by existing memory designs that have not evolved at pace with CPU and GPU innovations. These limitations manifest in significant latency and power consumption, hindering overall application performance.

Data-fetching delays, a direct consequence of these memory challenges, increase CPU idle times. While organizations invest more and more in chip technologies to deliver the best performance, the full potential of these chips isn’t realized. Further, as chip manufacturers advance to smaller process nodes (e.g., TSMC moving from 5nm to 3nm), this performance gap due to latency will only widen.

Along with these latency challenges, memory is also expensive. For (memory-)intensive applications in AI, genomics, financial modeling, weather modeling, and the like, memory capacity and bandwidth are critical to house and move large datasets in and out of the CPU. This means higher-capacity memory DIMMs that feed CPUs with many cores and as large a cache as possible. Cores that, as mentioned previously, often sit idle as they wait for data to crunch.

The Power Tax

In addition to the high capital expenditures associated with memory, operational expenditures increase with increased power consumption. While CPUs and GPUs are the primary contributors to a server’s electricity use, richly populated memory configurations also contribute significantly. For organizations that are increasingly power-constrained in this AI era, every watt lost to inefficiency is a watt painfully felt.

The International Energy Agency (IEA) estimates that datacenters consume roughly 2% to 3% of the global power footprint, equivalent to the entire power consumption of Australia. The growth in energy use underscores the urgent need for more power-efficient memory solutions, especially with predictions of datacenter energy consumption increasing to 8% by 2030.

Traditional Compression Techniques: A Mixed Bag

One way to address this memory challenge is through compression. However, not all compression techniques and technologies are created equal. Compression has long been employed as a strategy to mitigate memory inefficiencies. Traditionally, it is executed through software, which is perhaps the easiest method to employ. However, software-based compression introduces variability in performance and significant latency (up to 10,000 nanoseconds) due to the CPU overhead required for the compression tasks.

While more efficient than software solutions, hardware-based compression still falls short by introducing 2,000 to 3,000 nanoseconds of added latency. While many applications can tolerate this level of latency, it is still substantial for workloads that are extremely performance-intensive, such as AI inferencing, high-frequency trading, and any application with real-time responsiveness requirements.

Compute Express Link (CXL) is the most significant advancement in memory technology in decades. This high-speed interconnect delivers significantly improved performance between memory and CPUs and memory and accelerators. CXL-enabled servers can see a much more manageable 100 to 200 nanosecond latency.

However, there are better alternatives.

Enter ZeroPoint Technologies

ZeroPoint offers a game-changing solution through its low-latency, hardware-based memory optimization IP. This technology boasts near latency-free performance (3 to 5 nanoseconds), representing a 1,000-fold improvement compared to existing standards. ZeroPoint’s comprehensive approach to memory optimization sets it apart, addressing inefficiencies across the entire memory hierarchy—from on-chip SRAM to fast storage (NVMe).

ZeroPoint’s technology portfolio is broad and addresses the performance challenges along the data journey, from NVMe-based storage to memory to cache. This is critical as this holistic approach contributes to the near-native-level performance delivered. The ZeroPoint portfolio includes:

Cache optimization (CacheMX): CacheMX compresses on-chip SRAM, effectively doubling or even quadrupling cache capacity without adverse latency impacts. This expansion translates to a 30% improvement in computational performance.
Near-memory solutions:
- ZiptilionBW: A hardware accelerator that compresses data on the fly, enhancing DRAM bandwidth and capacity without requiring modifications to existing workloads.
- SuperRAM: Integrated into the system on a Cchip (SoC), this accelerator optimizes zram and zswap compression algorithms, significantly boosting performance and reducing power consumption.
Far-memory expansion: Leveraging the Compute Express Link (CXL) standard, ZeroPoint’s technology provides efficient memory bandwidth and capacity expansion, addressing the needs of latency-sensitive workloads through hardware-accelerated compressed-memory tiers.
NVMe expansion (FlashMX): FlashMX accelerates data compression between the CPU and NVMe storage using zstd algorithms, seamlessly integrating into existing systems for improved storage efficiency.

By implementing ZeroPoint memory optimization, organizations can realize significant upside in terms of cost and productivity. Some of the specific benefits the company touts are:

Energy efficiency: Improved performance per watt by approximately 50%, leading to significant reductions in power consumption.
Enhanced CPU productivity: Up to 80% increase in CPU productivity by minimizing idle times arising from memory latency.
TCO and carbon savings: Estimated 25% reduction in server TCO and considerable savings in the total cost of carbon ownership (TCCO).

Seamless Implementation for Real Savings, with a Catch

ZeroPoint’s technology is versatile, seamlessly integrating into various environments—whether it’s enhancing the performance of cache, near-memory (DRAM), far-memory (CXL-connected DRAM), or storage (NVMe). This adaptability ensures organizations can leverage ZeroPoint’s solutions across a broad spectrum of data-intensive applications, from HPC to AI.

While ZeroPoint’s technology is quite versatile and “just works,” it can only be realized when silicon providers partner with the company to embed its technology. I believe companies such as AMD and Intel can create competitive differentiation through such a partnership, perhaps through a premium chip offering targeted directly at HPC and AI workloads.

Final Thoughts

In a world increasingly driven by data, the need for efficient, high-performance memory solutions has never been more critical. ZeroPoint Technologies stands out as a pioneer in this domain, offering compression and memory optimization solutions that deliver unprecedented gains in performance, energy efficiency, and cost savings.

For enterprises grappling with the twin imperatives of driving performance and reducing costs, ZeroPoint provides a compelling solution. If your memory vendor, CPU provider, or cloud service is not leveraging ZeroPoint’s technology, it’s worth asking why. I expect ZeroPoint to become a key partner for silicon providers and systems designers aiming to create differentiation in latency-sensitive markets such as HPC and AI.

Matt Kimball

+ posts

Matt Kimball is a Moor Insights & Strategy senior datacenter analyst covering servers and storage. Matt’s 25 plus years of real-world experience in high tech spans from hardware to software as a product manager, product marketer, engineer and enterprise IT practitioner. This experience has led to a firm conviction that the success of an offering lies, of course, in a profitable, unique and targeted offering, but most importantly in the ability to position and communicate it effectively to the target audience.

Patrick Moorhead

+ posts

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights) in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.