Karl Freund, Author at Moor Insights & Strategy https://staging3.moorinsightsstrategy.com/author/karl-freund/ MI&S offers unparalleled advisory and insights to businesses navigating the complex technology industry landscape. Tue, 17 Sep 2024 14:46:10 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://moorinsightsstrategy.com/wp-content/uploads/2020/05/cropped-Moor_Favicon-32x32.png Karl Freund, Author at Moor Insights & Strategy https://staging3.moorinsightsstrategy.com/author/karl-freund/ 32 32 RESEARCH PAPER: Qualcomm: A New Force In Cloud AI https://moorinsightsstrategy.com/research-papers/research-paper-qualcomm-a-new-force-in-cloud-ai/ Tue, 15 Dec 2020 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-qualcomm-a-new-force-in-cloud-ai/ Qualcomm announced the first shipments of the Cloud AI 100 family of products, thus entering the data center, cloud edge, edge appliance and 5G infrastructure markets for accelerated artificial intelligence (AI) processing. Qualcomm has demonstrated its AI and power-efficiency capabilities for many years with the Snapdragon family of SoCs for mobile, networking and embedded markets. […]

The post RESEARCH PAPER: Qualcomm: A New Force In Cloud AI appeared first on Moor Insights & Strategy.

]]>
Qualcomm announced the first shipments of the Cloud AI 100 family of products, thus entering the data center, cloud edge, edge appliance and 5G infrastructure markets for accelerated artificial intelligence (AI) processing. Qualcomm has demonstrated its AI and power-efficiency capabilities for many years with the Snapdragon family of SoCs for mobile, networking and embedded markets. The company has now scaled up this technology to compete with NVIDIA, Intel, Xilinx, Amazon Web Services and scores of well-funded startups entering the current Cambrian Explosion of rapid advancements in AI technology. Like that geologic event – in which we find the sudden appearance and rapid diversification of almost all animal phyla emerging in the fossil record – in terms of the AI timeline, it will be interesting to see which solutions thrive and continue and which fall by the wayside as the technological evolution continues.

You can download the paper by clicking on the logo below:

Table Of Contents:

  • Summary
  • The Qualcomm Go-To-Market (GTM) Strategy For Entering The Cloud
  • The Cloud AI 100 Family Of AI SoCs And Cards
  • Performance, Latency And Efficiency Comparisons
  • Qualcomm’s AI Software Stack
  • Competitive Landscape
  • Overall Assessment And Conclusions
  • Figure 1: Qualcomm Serviceable Market (SAM) Estimates
  • Figure 2: Qualcomm’s Vision For Distributed Intelligence
  • Figure 3: The Cloud AI 100 Block Diagram
  • Figure 4: The Cloud AI 100 Edge Development Kit
  • Figure 5: Qualcomm Initial Benchmarks For Image Recognition
  • Figure 6: Cloud AI 100 Power Efficiency
  • Figure 7: The Software Development Stack For The Cloud AI 100

Companies Cited:

  • Alibaba
  • Amazon
  • Amazon Web Services (AWS)
  • Baidu
  • Blaize
  • Facebook
  • Flex Logix
  • Google
  • Groq
  • Gyrfalcon
  • Huawei
  • Intel
  • Microsoft
  • NVIDIA
  • Qualcomm
  • SambaNova
  • Tencent
  • Tenstorrent
  • Xilinx

The post RESEARCH PAPER: Qualcomm: A New Force In Cloud AI appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Comprehensive Silicon Lifecycle Management https://moorinsightsstrategy.com/research-papers/research-paper-comprehensive-silicon-lifecycle-management/ Mon, 14 Dec 2020 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-comprehensive-silicon-lifecycle-management/ While the global semiconductor industry has matured on many fronts, producing faster, cheaper and more power-efficient devices in vast quantities, lifecycle management for silicon remains lacking. Until now, there has been little thought given to the need for and benefits of an end-to-end, data-centric solution. Hopefully, this situation is about to change. You can download […]

The post RESEARCH PAPER: Comprehensive Silicon Lifecycle Management appeared first on Moor Insights & Strategy.

]]>
While the global semiconductor industry has matured on many fronts, producing faster, cheaper and more power-efficient devices in vast quantities, lifecycle management for silicon remains lacking. Until now, there has been little thought given to the need for and benefits of an end-to-end, data-centric solution. Hopefully, this situation is about to change.

You can download the paper by clicking on the logo below

Table Of Contents

  • Summary
  • An Introduction to Silicon Lifecycle Management (SLM)
  • Potential Benefits Of SLM
  • The Synopsys SLM Platform
  • Conclusions And Recommendations
  • Figure 1: Key Market Requirements And Solutions
  • Figure 2: SLM Market Size Estimates
  • Figure 3: Silicon Management Value Chain
  • Figure 4: Synopsys Silicon Lifecycle Management Platform

Companies Cited

  • Synopsys

The post RESEARCH PAPER: Comprehensive Silicon Lifecycle Management appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Tenstorrent’s Holistic Stack Of AI Innovation https://moorinsightsstrategy.com/research-papers/research-paper-tenstorrents-holistic-stack-of-ai-innovation/ Thu, 22 Oct 2020 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-tenstorrents-holistic-stack-of-ai-innovation/ The explosive growth of AI processing in data center and edge environments has induced AI startups and established firms alike to develop silicon to handle the massive processing demands of neural networks. Inference processing, in particular, is an emerging opportunity, wherein a trained deep neural network is processed to predict characteristics of new data samples. […]

The post RESEARCH PAPER: Tenstorrent’s Holistic Stack Of AI Innovation appeared first on Moor Insights & Strategy.

]]>
The explosive growth of AI processing in data center and edge environments has induced AI startups and established firms alike to develop silicon to handle the massive processing demands of neural networks. Inference processing, in particular, is an emerging opportunity, wherein a trained deep neural network is processed to predict characteristics of new data samples. This processing is typically performed on CPUs. However, that situation will have to change to handle the exponential growth in model size and new applications that depend on multiple neural networks to solve complex problems. We believe that the market for inference processing will exceed that of data center AI training in 3-4 years, surpassing $5B in annual chip sales by 2025.

You can download the paper by clicking on the logo below:

Table Of Contents:

  • Introduction
  • Tenstorrent’s Holistic Strategy
  • Tenstorrent Product Roadmap
  • Conclusions: The Holistic Approach Holds Tremendous Promise And Challenges
  • Figure 1: Tenstorrent Grayskull Processing Element (Single Core)
  • Figure 2: Packet Manager
  • Figure 3: O(N) Matrix Multiplication
  • Figure 4: ML vs Moore’s Law (Optimistic)
  • Figure 5: Flexible Scheduling & Parallelization
  • Figure 6: Tenstorrent Silicon Roadmap
  • Table 1: 65W Grayskull BERT Inference Performance

Companies Cited

  • AMD
  • Graphcore
  • NVIDIA
  • OpenAI
  • Qualcomm
  • Tenstorrent

The post RESEARCH PAPER: Tenstorrent’s Holistic Stack Of AI Innovation appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Blaize: AI For The Edge https://moorinsightsstrategy.com/research-papers/research-paper-blaize-ai-for-the-edge/ Tue, 13 Oct 2020 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-blaize-ai-for-the-edge/ While NVIDIA dominates the market for AI-specific silicon accelerating the training of neural networks, many AI startups are developing silicon to accelerate inference processing, both for data center and edge applications. CPUs have typically been the choice for inference processing, but this is changing rapidly as the size of neural networks grows exponentially and applications […]

The post RESEARCH PAPER: Blaize: AI For The Edge appeared first on Moor Insights & Strategy.

]]>
While NVIDIA dominates the market for AI-specific silicon accelerating the training of neural networks, many AI startups are developing silicon to accelerate inference processing, both for data center and edge applications. CPUs have typically been the choice for inference processing, but this is changing rapidly as the size of neural networks grows exponentially and applications are emerging that require multiple neural networks to solve complex problems. This far surpasses a CPU’s processing power. One of the critical challenges for inference processors is the selection of the right balance of performance, cost and power consumption for a specific set of applications – one size will not fit all. Into this mix jumps California-based startup Blaize with the announcement of its first generation of production-ready platforms that, it contends, provide that balance for targeted edge applications.

You can download the paper by clicking on the logo below:

Table Of Contents

  • Introduction
  • Blaize Target Markets And Products
  • The Blaize Software Suite: Picasso And AI Studio
  • Early Blaize Customer Projects
  • Conclusions And Recommendations
  • Figure 1: Blaize’s Packaging Options For Its GSP Chip For Standalone And Host-Connected Applications
  • Figure 2: Blaize Picasso Software Stack
  • Figure 3: Industrial Monitoring
  • Figure 4: Smart City Applications For Traffic Flow And Public Safety
  • Figure 5: Retail Applications

Companies Cited

  • Arm
  • Blaize
  • Daimler
  • Denso
  • NVIDIA
  • Samsung

The post RESEARCH PAPER: Blaize: AI For The Edge appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: The Graphcore Second-Generation IPU https://moorinsightsstrategy.com/research-papers/research-paper-the-graphcore-second-generation-ipu/ Wed, 15 Jul 2020 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-the-graphcore-second-generation-ipu/ Graphcore, the U.K.-based startup that launched the Intelligence Processing Unit (IPU) for AI acceleration in 2018, has introduced the IPU-Machine. This second-generation platform has greater processing power, more memory and built-in scalability for handling extremely large parallel processing workloads. The well-funded startup has a blue-ribbon pedigree of engineers, advisers and investors, and enjoys a valuation […]

The post RESEARCH PAPER: The Graphcore Second-Generation IPU appeared first on Moor Insights & Strategy.

]]>
Graphcore, the U.K.-based startup that launched the Intelligence Processing Unit (IPU) for AI acceleration in 2018, has introduced the IPU-Machine. This second-generation platform has greater processing power, more memory and built-in scalability for handling extremely large parallel processing workloads. The well-funded startup has a blue-ribbon pedigree of engineers, advisers and investors, and enjoys a valuation approaching $2 billion. Its first-generation hardware is now available on the Microsoft Azure cloud as well as in Dell-EMC servers. Both companies are investors. Graphcore is now betting its future on this second-generation platform, a plug-and-play building block for massive scalability that is currently unique in the industry.

You can download the paper by clicking on the logo below:

Table Of Contents:

  • Introduction
  • The Colossus MK2 IPU (GC200)
  • The IPU-Machine (M2000)
  • The Second-Get IPU-Fabric
  • Updates To The Graphcore Software Stack
  • Performance
  • Conclusions
  • Figure 1: The Graphcore IPU-Machine
  • Figure 2: IPU-Machine M2000 Architectural Diagram
  • Figure 3: IPU-Machine Stacking Options
  • Figure 4: IPU-POD For Super Computing Scale
  • Figure 5: IPU-POD64 Reference Architecture
  • Figure 6: POPLAR SDK
  • Figure 7: IPU-POD Support For Multi-Tenancy
  • Figure 8: Colossus MK2 Performance

Companies Cited

  • Dell EMC
  • Graphcore
  • Microsoft Azure
  • NVIDIA

The post RESEARCH PAPER: The Graphcore Second-Generation IPU appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: The Graphcore Software Stack: Built To Scale https://moorinsightsstrategy.com/research-papers/research-paper-the-graphcore-software-stack-built-to-scale/ Tue, 26 May 2020 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-the-graphcore-software-stack-built-to-scale/ Software for new processor designs is critical to enabling application deployment and optimizing performance. UK-based startup Graphcore, a provider of silicon for application acceleration, places significant emphasis on software, dedicating roughly half its engineering staff to the challenge. Graphcore’s Intelligence Processing Unit (IPU) utilizes the expression of an algorithm as a directed graph, and the company’s Poplar software stack […]

The post RESEARCH PAPER: The Graphcore Software Stack: Built To Scale appeared first on Moor Insights & Strategy.

]]>
Software for new processor designs is critical to enabling application deployment and optimizing performance. UK-based startup Graphcore, a provider of silicon for application acceleration, places significant emphasis on software, dedicating roughly half its engineering staff to the challenge. Graphcore’s Intelligence Processing Unit (IPU) utilizes the expression of an algorithm as a directed graph, and the company’s Poplar software stack translates models and algorithms into those graphs for execution. The software simplifies adoption of the chip for AI and parallel computing, making it vital to the company’s success. This paper explores the benefits provided by the company’s software and discusses how these capabilities could speed development and deployment of applications that run on Graphcore IPUs.

You can download the paper by clicking on the logo below:

Table Of Contents

  • Introduction
  • A Brief Overview Of The Intelligence Processing Unit
  • The Graphcore Software Platform
  • Deployment: Industry Standard Tools And Platforms
  • Figure 1: The Graphcore Intelligent Processor
  • Figure 2: The Graphcore Software Platform
  • Figure 3: Graphcore’s Poplar Libraries
  • Figure 4: Graphcore Ecosystem Support

Companies Cited

  • Cirrascale
  • Dell Technologies
  • Graphcore
  • Microsoft

The post RESEARCH PAPER: The Graphcore Software Stack: Built To Scale appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: AI Could Be The Next Killer App In Semiconductor Design https://moorinsightsstrategy.com/research-papers/research-paper-ai-could-be-the-next-killer-app-in-semiconductor-design/ Mon, 20 Apr 2020 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-ai-could-be-the-next-killer-app-in-semiconductor-design/ It goes without saying that designing complex semiconductors is an extremely challenging engineering process. While the chips themselves are small, and the individual features on the chips are tiny, as small as 7 nanometers (seven billionths of a meter), the intricacy of the design process is astronomical. You can download the paper by clicking on […]

The post RESEARCH PAPER: AI Could Be The Next Killer App In Semiconductor Design appeared first on Moor Insights & Strategy.

]]>
It goes without saying that designing complex semiconductors is an extremely challenging engineering process. While the chips themselves are small, and the individual features on the chips are tiny, as small as 7 nanometers (seven billionths of a meter), the intricacy of the design process is astronomical.

You can download the paper by clicking on the logo below:

Table Of Contents:

  • Introduction
  • The Challenges Of Optimizing The Broader Design Space
  • The Evolution of ML in Electronic Design
  • The Path Forward For AI In Chip Design
  • Conclusions And Recommendations
  • Figure 1: Traditional Physical Design Space Exploration
  • Figure 2: Machine Learning In Physical Design
  • Figure 3: The Synopsys DSO.ai Design Space Optimization System
  • Figure 4: Synopsys DSO.ai Customer Results

Companies Cited:

  • Synopsys

The post RESEARCH PAPER: AI Could Be The Next Killer App In Semiconductor Design appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Qualcomm’s New Distributed Intelligence Platform https://moorinsightsstrategy.com/research-papers/research-paper-qualcomms-new-distributed-intelligence-platform/ Tue, 14 Jan 2020 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-qualcomms-new-distributed-intelligence-platform/ Qualcomm has been innovating artificial intelligence (AI) features for years in their Snapdragon products, and the next generation Snapdragon 865 & 765 5G Mobile Platform, due in 2020, are designed to take AI performance to the next level and enable distributed intelligence over 5G wireless. AI is becoming ubiquitous on smartphones, and although many consumers […]

The post RESEARCH PAPER: Qualcomm’s New Distributed Intelligence Platform appeared first on Moor Insights & Strategy.

]]>
Qualcomm has been innovating artificial intelligence (AI) features for years in their Snapdragon products, and the next generation Snapdragon 865 & 765 5G Mobile Platform, due in 2020, are designed to take AI performance to the next level and enable distributed intelligence over 5G wireless. AI is becoming ubiquitous on smartphones, and although many consumers may be unaware their phones are running AI, it is becoming indispensable in primary applications such as photography, voice recognition, extended reality, gaming and even real-time spoken language translation. As a result, the battle for smartphone market share is shifting from displays to AI, and Qualcomm intends to extend its lead with these new fast chips and software.

You can download the paper by clicking on the logo below:

Table Of Contents:

  • Distributed Intelligence Begins With A Great Smartphone
  • Introducing The New Snapdragon Family
  • Software To Make It All Work
  • A Few Interesting Examples
  • Distributed Intelligence Becomes A Reality
  • Conclusions
  • Figure 1: Qualcomm’s Vision For Distributed
  • Figure 2: The Snapdragon 865 Features
  • Figure 3: Accelerators To Optimize Performance
  • Figure 4: Qualcomm’s Hardware And Software Development Suite
  • Figure 5: Snapdragon 865 AI Performance Benchmarks
  • Figure 6: Loom.ai Avatar Demonstration

Companies Cited:

  • ARM
  • Google
  • Qualcomm
  • Snap, Inc.

The post RESEARCH PAPER: Qualcomm’s New Distributed Intelligence Platform appeared first on Moor Insights & Strategy.

]]>
Arm Adds More AI Firepower https://moorinsightsstrategy.com/arm-adds-more-ai-firepower/ Fri, 08 Nov 2019 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/arm-adds-more-ai-firepower/ As I covered in a recent article on the status of startups building chips for AI, the AI accelerator market will become quite crowded over the next 12-18 months. Not to be outdone, Arm is now extending its neural network processor family, adding logic designs for mid-range and entry-level devices to the flagship Ethos-N77, which targets higher-end […]

The post Arm Adds More AI Firepower appeared first on Moor Insights & Strategy.

]]>
As I covered in a recent article on the status of startups building chips for AI, the AI accelerator market will become quite crowded over the next 12-18 months. Not to be outdone, Arm is now extending its neural network processor family, adding logic designs for mid-range and entry-level devices to the flagship Ethos-N77, which targets higher-end smart phones. For companies needing AI logic for their own Arm-based mobile and embedded chips, the three Ethos designs will offer good performance, small size and very low power options for them to consider.

The question I often field, however, is whether Arm is too late to the party, since large Arm partners such as Qualcomm  have already designed their own AI accelerator engines. The bottom line, in my opinion, is that the AI chip industry is only in the 2nd inning, and there’s lots of game left to play. However, Arm will face partners who are already competitors in the AI design space. Let’s look at Arm’s design, which of course is licensable IP, not a chip manufactured by Arm.

The Arm Ethos design

Since many larger mobile chip designers have already added an AI engine to their application and modem SOCs, it looks like Arm decided to build IP for the rest of the market, providing IP for those partners to add to their own or licensed IP. The Ethos design comes complete with a scalable MAC Engine for the math, a Programmable Layer Engine for activations and vector processing and a Network Control Unit to manage the neural network workflow. The MAC + PLE is a scalable unit, available in 4, 8 and 16 blocks in the Ethos-N37, N57 and N77 IP. The company projects performance for these chips at 1GHz to attain 1, 2 and 4 Trillion Operations per Second respectively. The MAC supports the 8-and 16-bit integer math now commonly used for efficient inference processing. Importantly, the PLE and the NCU do the work often handled by an application CPU, so this design should deliver low latency at low power and low cost.

To help solve the memory bottleneck AI chips face, these designs include on-die SRAM memory, from 512KB to 4MB, reducing traffic to external memory. In addition, Arm equipped the devices with on-the-fly compression and memory management to further reduce memory consumption—nice! Finally, the chip supports scaling out to handle larger jobs, interconnecting up to 16 Ethos chips.

Figure 1: All three Ethos products feature the same rich set of power-efficient and scalable logic.
 ARM

To help partners building chips for a wide variety of applications needing AI, Arm now offers IP across a fairly wide spectrum, all supported by the same software stack based on Arm NN. The range of devices Arm envisions for these AI engines spans a broad range of popular AI tasks.

Figure 2: The Ethos N37, 57, and 77 cover the broad range of applications, devices, and price points
ARM

Frankly, I’ve been wondering when Arm would come out with a competitive series of designs for AI acceleration; there is no doubt it is late to the party and all mobile SOCs need AI. But as I mentioned, this party is just getting started. AI is no longer a “nice to have” option—every smartphone will need to perform computational photography, language translation and voice recognition, even at the low end. The Ethos design is well thought out, with important features that will help ensure high performance and low battery consumption for these computationally intensive tasks.

Arm’s success here will not come without challenges. I would point out that Arm’s apparent delay created an opportunity for companies like Qualcomm, whose Snapdragon 855 delivers 7 TOPS today and powers the Google  Pixel 4, among others. The amazing photography everyone is raving about in the Pixel 4, and the ability to perform voice processing without network connectivity is all enabled by Google software running on the Hexagon AI Engine.

The post Arm Adds More AI Firepower appeared first on Moor Insights & Strategy.

]]>
Who Is The Leader In AI Hardware? https://moorinsightsstrategy.com/who-is-the-leader-in-ai-hardware/ Tue, 05 Nov 2019 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/who-is-the-leader-in-ai-hardware/ A few months ago, I published a blog that highlighted Qualcomm’s plans to enter the data center market with the Cloud AI100 chip sometime next year. While preparing the blog, our founder and principal analyst, Patrick Moorhead, called to point out that Qualcomm , not NVIDIA , probably has the largest market share in AI chip volume thanks to its leadership […]

The post Who Is The Leader In AI Hardware? appeared first on Moor Insights & Strategy.

]]>
A few months ago, I published a blog that highlighted Qualcomm’s plans to enter the data center market with the Cloud AI100 chip sometime next year. While preparing the blog, our founder and principal analyst, Patrick Moorhead, called to point out that Qualcomm , not NVIDIA , probably has the largest market share in AI chip volume thanks to its leadership in devices for smartphones. Turns out, we were both right; it just depends on what you are counting. In the mobile and embedded space, Qualcomm powers hundreds of consumer and embedded devices running AI; it has shipped well over one billion Snapdragons and counting, all which support some level of AI today. In the data center, however, NVIDIA likely has well over 90% share of the market for training. Meanwhile, Intel  rightly claims the lion’s share of the chips for inference processing in the world’s largest data centers. I’ve written extensively about NVIDIA and Intel, so let’s take a look into Qualcomm Technology Inc. (QTI).

The path to distributed intelligence

Smartphones today are the pervasive interface for nearly 3 billion people to communicate, take photos and videos, and access personal data and applications. You knew that, but you may be unaware of how much AI those phones are processing. If you have ever taken a photo using an Android phone, you have probably used AI from QTI.

Qualcomm intends to use its leadership in power-efficient mobile processing to expand beyond the handset to create “Distributed Intelligence,” where AI processing can be performed as close to the user as possible, offloading to the cloud when needed. So, the processing or pre-processing can occur on the device, in the cloud edge and/or in the data center, depending on power, data and latency requirements. By interconnecting these three processing tiers via 5G networking, each tier can collaborate to improve understanding and deliver advanced functionality for the user.

The QTI AI engine overview

QTI takes a heterogeneous computing approach to deliver AI and application performance at low power and cost. It places its Hexagon Processor, Adreno GPU and Kryo CPU, as well as the modem, security processor and other logic, on the same die. The latest update to Hexagon includes a dedicated Tensor accelerator, akin to the TensorCores found on NVIDIA’s latest GPUs. A few AI-enabled applications on QTI-equipped mobile phones that make use of these cores include Dual and Single-camera Bokeh, Secure 3D Face Authentication, Scene Detection, Super Resolution and myriad of computational photography enhancements. Benchmarks published by Anandtech and PCMag verify that the Snapdragon 855, which can deliver over 7 trillion operations per second (TOPS), outperforms the Huawei Kirin and Samsung Exynos mobile processors.

Figure 2: The QTI Snapdragon platform supports CPU, GPU, and an AI engine). All blocks can be used by AI programmers and the device is available across a wide spectrum of power, performance and price points. 
QTI

Software: frameworks, tools and libraries

Of course, any chip needs software to be useful, and QTI built a robust stack for AI applications. This includes the popular neural network frameworks like PyTorch and TensorFlow, and also the frameworks developed by Microsoft , Amazon , Facebook  and Baidu . QTI supports the Open Neural Network Exchange (ONNX), a common data format for importing neural networks, and runs them on the processors found in Snapdragon. This illustrates an important strategic thrust for QTI: the company wants to support practically every style of AI directly on Snapdragon and has the software to meet the needs of a diverse development community.

Use cases for AI in 5G

5G will utilize AI processing for network optimization and enable new AI-enabled applications, thanks to faster processors and the technology’s 1ms-or-less latency, higher bandwidth at 1 Gbps and massive connectivity. QTI believes the 5G future will both require and enable “distributed intelligence.”

AI application in 5G wireless transmission and management is in its infancy but will become essential in optimizing these networks. AI will assist in transitioning the management of wireless networks from a human-centric model to an automated model, improving signal quality and service levels. Specific applications that reside in base stations include the following:

  • Predicting base-station switching handoffs to minimize quality degradation and dropped calls
  • Planning and provisioning beams for frequently traveled paths to optimize signal strength and quality of service
  • Enabling beamforming in massive MIMO arrays and millimeter-wave antennae to optimize transmission quality by identifying the most efficient data delivery route for a particular user (this provides a higher quality of service at lower power and bandwidth consumption)

Conclusions

QTI believes that in order for AI to realize its potential as a transformative technology, AI must become pervasive and collaborative across mobile, edge and cloud computing resources. The company embraces this strategy as “Distributed Intelligence” and recognizes that this will require new products to extend its technology footprint far beyond mobile and embedded devices. To accomplish this, the company must do the following:

  • Continue to innovate in AI acceleration and software on mobile devices
  • Expand that technology in edge device markets, including smart IoT and self-guided devices such as autonomous vehicles, drones, etc.
  • Develop platforms for edge and cloud computing service providers to complement and extend the intelligence that requires more processing power than what is available on the device

Going forward, the company is investing in the technologies and market development programs that will help it expand further into intelligent edge cloud and data center environments. In fact, the stated goal for the AI Cloud 100 of reaching 350 TOPS may well put it at or near the lead position when it comes to market in 2020, assuming the company can translate that performance potential into real application benefits (real apps and the mlPerf benchmarks, for example).

Of course, QTI will face challenges in this market expansion into the data center and cloud edge—from established companies like Intel, NVIDIA and Xilinx, as well as from a vast array of startups. In fact, NVIDIA has been making significant progress in extending its lead in training to the inference market, as I noted in this blog last week. Moor Insights & Strategy believes that when QTI launches its cloud and next-generation 5G products, the company will emerge as a significant player in the AI computing revolution. We look forward to hearing more details of the company’s plans and products at the annual QTI Tech Summit in December.

The post Who Is The Leader In AI Hardware? appeared first on Moor Insights & Strategy.

]]>
AI Hardware: Harder Than It Looks https://moorinsightsstrategy.com/ai-hardware-harder-than-it-looks/ Mon, 28 Oct 2019 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/ai-hardware-harder-than-it-looks/ The second AI HW Summit took place in the heart of Silicon Valley on September 17-18, with nearly fifty speakers presenting to over 500 attendees (almost twice the size of last year’s inaugural audience). While I cannot possibly cover all the interesting companies on display in a short blog, there are a few observations I’d […]

The post AI Hardware: Harder Than It Looks appeared first on Moor Insights & Strategy.

]]>
The second AI HW Summit took place in the heart of Silicon Valley on September 17-18, with nearly fifty speakers presenting to over 500 attendees (almost twice the size of last year’s inaugural audience). While I cannot possibly cover all the interesting companies on display in a short blog, there are a few observations I’d like to share.

John Hennessy’s keynote

Computer architecture legend John Hennessy, Chairman of Alphabet and former President of Stanford University, set the stage for the event by describing how historical semiconductor trends, including the untimely demise of Moore’s Law and Dennard scaling, led to the demand and opportunity for “Domain-Specific Architectures.” This “DSA” concept applies not only to novel hardware designs but to the new software architecture of deep neural networks. The challenge is to create and train massive neural networks and then optimize those networks to run efficiently on a DSA, be it a CPU, GPU, TPU, ASIC, FPGA or ACAP, for “inference” processing of new input data. Most startups wisely decided to focus on inference processing instead of the training market, avoiding the challenge of tackling the 800-pound gorilla that is NVIDIA .

The new approach to software, where the software creates “software” (aka, “models”) through an iterative learning process, demands supercomputing performance. To make the problem even more challenging, the size of these network models is increasing exponentially, doubling every 3.5 months, creating an insatiable demand for ever more performance. As a result, there are now well over 100 companies developing new architectures to bring the performance up and the cost of computing down. However, they have their work cut out for them. Intel’s Naveen Rao points out that to achieve the required 10X improvement every year it will take 2X advances in architecture, silicon, interconnect, software and packaging.

Figure 1: Intel’s Naveen Rao says that the compute capacity needed to handle increasing model complexity will need to improve by 10X every year.
 INTEL

Observation #1: 20 guys in a garage cannot out-engineer the leaders

The startups can and will invent novel architectures that could beat the incumbents in performance, but they will require partnerships with large customers to bring these technologies to market at scale. And while the rich set of architectural approaches is pretty amazing, the pace of development of both the hardware and the prerequisite software is frustratingly slow. A year ago, dozens of startups presented their plans in PowerPoint at the Summit event. This year, dozens of startups presented updated PowerPoints. Where’s the hardware?

The fact is that few new chips are in volume production since the last summit. Qualcomm  Snapdragon 855 and Alibaba’s Hanguang 800 are notable exceptions; Snapdragon is, of course, a mobile SOC, and Hanguang is only for Alibaba’s internal use. In part, the delay is because this stuff is a lot harder than it initially looks (isn’t all silicon?). But let’s also be realistic: 20, 50 or even 100 engineers are not going to out-engineer companies like NVIDIA, GoogleXilinxMicrosoftAmazon  AWS and Intel. They can innovate amazing new architectures, but execution is the science of engineering, not the art of architectural design. While many can build a fast chip with lots of TOPS, it will “take a village” of researchers, engineers, university professors, internet datacenters and social networking companies to turn those TOPS into usable performance and to build and optimize models for these new chips.

Israeli-startup Habana Labs offers a good example of the challenge. Habana launched its first impressive chip, Goya, for data center inference processing at the inaugural AI HW Summit event. Yet, a full year later, there are no public endorsements or deployments of Goya in spite of the chip’s exceptional performance and very low power. This is not because Goya doesn’t work; its because the “rest of the story” will just take some time and effort to play out.

Another prime example is Intel’s Nervana neural network processor. Even armed with an innovative design and a world-class engineering team, that chip was shelved after 3 years of work. Intel wisely went back to the drawing boards with additional experience and customer feedback about a year ago to figure out how it could compete with NVIDIA’s now 3-year-old V100 TensorCore technology, still the industry’s fastest AI chip. Unlike a startup, Intel can afford to wait until it can deliver a winner: Intel’s Nervana processors (NNP-T and NNP-I) are now expected to be sampling later this year. However, NVIDIA isn’t standing still—we should see its new 7nm designs sometime soon (perhaps at SC19 in November, but more likely at GTC ‘20 next spring).

Going forward, the pace of production deployment for new chips will be gated by the depth and breadth of the ecosystem investments, in addition to the completion of the chips themselves. Keep in mind that while data centers are embracing heterogeneity, they prefer what I would call homogeneous heterogeneity—selecting a minimum number of chip architectures that can cover the widest range of workloads. To do otherwise would be unprofitable, due to the low utilization of fragmented compute realms, and costly to manage.

Observation #2: There are many avenues to improve performance

As I listened to the presenters at the summit, I was amazed by the rich landscape of innovations they outlined. Here are a few highlights, beyond the use of lower precision, tensor cores and arrays of MACs (multiply-accumulate cores). These are not orthogonal approaches, by the way.

Figure 2: A short list of some of the innovations being pursued in the search for faster and more… [+] 
MOOR INSIGHTS & STRATEGY

There are two primary categories for these architectures. Von Neuman massively parallel designs use code (kernels) that process matrix operations in the traditional realm of digital computers (do this, then do this, …). More radical approaches typically take the form of melding compute and memory on a chip, either using digital representations for weights and activations that comprise the neural networks or using analog techniques that more closely resemble the biological functions of the human brain. The analog approach is higher risk, but could hold significant promise.

Many of the digital in-memory designs use data flow computing architectures, including Cerebras and Xilinx Versal, where AI cores are embedded in fabric with on-die memory that pipes activations to and from successive network layers. To make any of these designs work well in inference, the players will need to develop custom compiler technology to optimize the network, trim the unused parts of the network, and eliminate multiplication by zero (where of course the answer is zero).

Figure 3: A useful and simple taxonomy to help put the companies and architectural styles into… [+]
MYTHIC

Don’t get me wrong, most of these companies, big and small, are going to deliver some pretty amazing designs. Let’s keep in mind, though, the time and magnitude of investments needed to build useful scalable solutions from a novel DSA device. To put that investment in perspective, I suspect that NVIDIA spends hundreds of millions of dollars every year to foster innovation around the world for AI research and development on its chips. No startup can afford this, so they will need to attract some big design wins to help carry them across the chasm.

Observation #3: NVIDIA is still on top

Ian Buck, VP and GM of NVIDIA’s Data Center business unit, bravely took the stage as the event’s last presenter, standing in front of hundreds of hungry wolves dedicated to taking NVIDIA down a notch. NVIDIA has made progress in extending its technology for inference through faster software and DNN research supported by its Saturn V Supercomputer (#22 on the Top 500 list). Buck pointed to design wins for inference, including some big names and a wide range of use cases.

Figure 4: NVIDIA was able to show a dozen companies that have adopted GPUs for inference, as well as all the major cloud vendors. 
NVIDIA

To help drive inference adoption on GPUs, NVIDIA announced Version 6 of TensorRT—software that includes an optimizer and run-time support to deploy trained neural networks for inference processing on the range of NVIDIA hardware. It supports the $99 Jetson for embedded processing, Xavier for autonomous vehicles, the Turing T4 for data center applications, and more.

Second, Amazon AWS announced support for the NVIDIA TensorCore T4 GPU, a 75-watt PCIe card that can support complex inference processing for images, speech, translation and recommendations. NVIDIA T4 will be a common comparison target for startups such as Habana Labs and established companies like Intel Nervana. While I assume new chips will come along with outstanding metrics, NVIDIA will rightly argue that the usefulness of these devices in a cloud will depend on the amount of available software and a user base comfortable with running a variety of models on these accelerators.

Finally, demonstrating that GPUs can continually evolve in place (counter to what many startups claim), NVIDIA announced the 8.3 billion parameter Megatron-LM transformer network for language processing. Developed on NVIDIA’s Saturn V using 512 GPUs, this also shows what you can do when you have your own AI supercomputer. Note that NVIDIA also doubled the performance of its existing V100 GPU in just 7 months, as measured by the mlPerf benchmark.

Some still think inference is for lightweights. NVIDIA showed that modern inference use cases require multiple models at real-time latencies to meet users’ expectations, with 20-30 containers collaborating to answer a simple verbal query.

Figure 5: This slide depicts the workflow for answering a simple verbal query.
 NVIDIA

Conclusions

The coming Cambrian Explosion in domain-specific architectures is exciting, but it is still “coming soon to a server near you.” By the time most startups reach the starting gate, many of their potential customers like Google, Amazon AWS, Baidu and Alibaba will have their own designs in production. Additionally, the big semiconductor vendors will have new silicon ready to crunch even bigger networks (like Megatron-LM) or power energy-efficient inference designs.

This doesn’t mean startups should simply give up and return their capital to their investors, but the startups will have a very high bar to reach, by a substantial margin. Either that or they will need to target niche markets where they can win with better power efficiency and lower prices.

Of course, another option for them is to Go Big, or Go Home, as Cerebras is attempting to do with its Wafer-Scale AI Engine recently announced at Hot Chips. However, this is not an approach I would recommend for the faint of heart! I look forward to seeing the domain-specific architecture landscape develop further.

The post AI Hardware: Harder Than It Looks appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Qualcomm: Ubiquitous AI For 5G https://moorinsightsstrategy.com/research-papers/research-paper-qualcomm-ubiquitous-ai-for-5g/ Mon, 14 Oct 2019 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-qualcomm-ubiquitous-ai-for-5g/ Smartphones today are the pervasive interface for some three billion people to communicate, take photos and videos, and access personal data and applications—all of which are increasingly dependent on AI and Deep Learning. Consequently, mobile processors must accelerate a wide range of AI features in applications, including image processing, voice recognition, translation and gaming. It […]

The post RESEARCH PAPER: Qualcomm: Ubiquitous AI For 5G appeared first on Moor Insights & Strategy.

]]>
Smartphones today are the pervasive interface for some three billion people to communicate, take photos and videos, and access personal data and applications—all of which are increasingly dependent on AI and Deep Learning. Consequently, mobile processors must accelerate a wide range of AI features in applications, including image processing, voice recognition, translation and gaming. It is critical that the mobile semiconductors are fast but also must be extremely power-efficient to help conserve battery life.

You can download the paper by clicking on the logo below:

Table Of Contents

  • Introduction: The Road To Distributed Intelligence
  • Background
  • QTI AI Technology Overview
  • QTI’s AI Market Strategy
  • Conclusions
  • Figure 1: The Importance Of On-Device Computing
  • Figure 2: QTI SoCs With AI Support
  • Figure 3: The Snapdragon 855
  • Figure 4: Qualcomm’s Vision Intelligence Platform
  • Figure 5: Qualcomm’s Software Stack For AI
  • Figure 6: Smart Phone Use Cases For AI

Companies Cited:

  • Amazon
  • Anandtech
  • Baidu
  • Facebook
  • Microsoft
  • NVIDIA
  • PCMag
  • Qualcomm Technologies Inc (QTI)

The post RESEARCH PAPER: Qualcomm: Ubiquitous AI For 5G appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: NovuMind: An Early Entrant In AI Silicon https://moorinsightsstrategy.com/research-papers/research-paper-novumind-an-early-entrant-in-ai-silicon/ Wed, 29 May 2019 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-novumind-an-early-entrant-in-ai-silicon/ NovuMind is a Silicon Valley startup that builds full-stack Artificial Intelligence (AI) solutions, AI models, algorithms, boards, chips, and intellectual property for both cloud and edge applications. The company focuses on AI inference where a trained deep neural network is used to process images, sound, speeches and other types of information. The market for inference […]

The post RESEARCH PAPER: NovuMind: An Early Entrant In AI Silicon appeared first on Moor Insights & Strategy.

]]>
NovuMind is a Silicon Valley startup that builds full-stack Artificial Intelligence (AI) solutions, AI models, algorithms, boards, chips, and intellectual property for both cloud and edge applications. The company focuses on AI inference where a trained deep neural network is used to process images, sound, speeches and other types of information. The market for inference processing is forecasted to grow rapidly as AI applications become pervasive in robotics, cloud applications, autonomous vehicles,and smart edge devices. This white paper will explore the company’s strategy,technology and ability to differentiate in this fast-moving and soon-to-be crowded marketplace.

You can download the paper by clicking on the logo below:

Table Of Contents

  • Executive Overview
  • Company Overview And Strategy
  • The 1st Generation Of NovuTensor
  • The Company’s Technology Roadmap
  • NovuMind Customer Examples
  • Conclusions
  • Figure 1: NovuMind’s View Of The AI Market
  • Figure 2: NovuMind’s 3D Operation
  • Figure 3: Upscaling Lower Resolution Images To 8K

Companies Cited

  • AMD
  • Baidu
  • Hewlett Packard
  • NCS Pte. Ltd.
  • NovuMind
  • NVIDIA
  • SingTel

 

The post RESEARCH PAPER: NovuMind: An Early Entrant In AI Silicon appeared first on Moor Insights & Strategy.

]]>
NVIDIA GTC 2019: Datacenter Ecosystem, But No New Chips https://moorinsightsstrategy.com/nvidia-gtc-2019-datacenter-ecosystem-but-no-new-chips/ Fri, 12 Apr 2019 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/nvidia-gtc-2019-datacenter-ecosystem-but-no-new-chips/ Article by Karl Freund. Last week at GTC 2019, Jensen Huang, the high energy and immensely entertaining CEO and founder of NVIDIA , took the stage to give his keynote to the event’s 6,000+ attendees. However, this was anything but his usual keynote. We have all been spoiled for years by NVIDIA’s dependable yearly flood of new […]

The post NVIDIA GTC 2019: Datacenter Ecosystem, But No New Chips appeared first on Moor Insights & Strategy.

]]>
Article by Karl Freund.

Last week at GTC 2019, Jensen Huang, the high energy and immensely entertaining CEO and founder of NVIDIA , took the stage to give his keynote to the event’s 6,000+ attendees. However, this was anything but his usual keynote. We have all been spoiled for years by NVIDIA’s dependable yearly flood of new products for graphics and AI. This year, though, Huang spent a lot of time describing a dizzying myriad of relatively small announcements. While some may have been disappointed, this shouldn’t have surprised anyone who has been paying attention. NVIDIA’s GPUs for graphics and data science applications were all refreshed over the last couple of years, and they remain clear performance leaders in their segments.

While NVIDIA’s hardware engineers have been hard at work designing their next generation chips, their software and partner engineering counterparts have been busy deepening the formidable moat that the NVIDIA ecosystem provides as a defense against newcomers. Jenson’s keynote clocked in at 2:40, and it would take hours to read and understand the myriad of announcements. If interested, his full keynote can be found here. For the sake of this column, we’ll stick to the highlights.

Figure 1: CEO Jensen Huang clearly enjoyed his time on the stage, hamming it up with the Jensen Lego doll image holding the new $99 Jetson Nano.
 KARL FREUND

NVIDIA GPUs becoming ubiquitous in the datacenter

Jenson shared a slew of announcements geared towards making GPUs more widely available and consumable, in a variety of hardware and cloud platforms for running games, workstation applications, and AI. This included new servers with T4 GPUs from Cisco, Dell EMC , Fujitsu , Hewlett Packard Enterprise , Inspur, Lenovo , and Sugon. T4s are also now available as cloud instances from AWS, adding to the existing Google  Beta support for T4 announced in January. This is significant because NVIDIA must successfully fight off the coming hordes of startups that are bringing inference chips to market later this year. The affordable and fast T4 is essentially a mass-market, multi-purpose Data Center GPU. It can be used for AI inference, gaming, remote workstations (VDI), ray tracing for rendering, and even AI training, according to Ian Buck, NVIDIA’s VP of Data Center products.

If you are an NVIDIA customer, this is all very good news. If you are intending to compete with NVIDIA, your job just got a lot more difficult. NVIDIA’s broad ecosystem will be difficult if not impossible to match; simply having a better chip for limited uses won’t cut it, except for a few very large markets such as vision processing (think smart surveillance cameras), autonomous vehicles, and industrial automation (think robots).

Inference isn’t always easy

One of the most impressive demos Jensen shared was the MicrosoftBing conversational search engine, powered by NVIDIA GPUs. Many people think that inference is a lot easier than training, which is true; for this reason, they believe that low-cost inference chips will rule the roost, which is debatable. Simplicity is only true for relatively easy jobs, like object detection and recognition in images in a . Truly intelligent services like Bing combine many inference tasks to create a natural interface that understands what the user is really looking for. For example, the workflow could look like this:

  1. understanding the spoken query in a number of languages
  2. translating that query into text
  3. submitting that query to a search engine
  4. determining the most optimal response, perhaps in the context of a multi-query conversation
  5. synthesizing the spoken result, and
  6. displaying the top-ranked results.

According to Microsoft and Jensen Huang, it really takes a GPU to deliver both the computational complexity and programmability to process the different types of neural network each step requires (such as DNNs, CNNs, GANs, RNNs, and Reinforcement Learning).

One software platform to rule them all

Another key announcement that might be a little confusing is the new “CUDA-X” AI Ecosystem. Basically this is a rebranding of the data analytics, graph processing, Machine Learning ((RAPIDS), and Deep Learning training and inference libraries that span workstations, servers, and cloud platforms. Just like the NVIDIA GPU Cloud, this restructuring and combining of software stacks should ease deployment of compatible software componentry. From an industry perspective, CUDA-X widens and deepens the defensive moat NVIDIA enjoys today, potentially protecting NVIDIA from the dozens of AI chip startups and giants who are preparing their own devices and software for introduction later this year (see my three part series on the Cambrian Explosion of AI Chips).

Figure 2: CEO Jensen Huang introduces the concept of CUDA-X at GTC, combining the various AI platforms from NVIDIA into a comprehensive library under a single brand.
 NVIDIA

Autonomous vehicles on parade

The GTC show floor provided live interaction with thousands of NVIDIA-powered devices from hundreds of vendors. The most impressive was undoubtedly a massive “TuSimple” semi-truck, based on a customized Peterbilt model. As a startup, TuSimple has attracted $178M in venture funding, and provides full Level 4 Autonomous long and short-haul trucking as a service. It is deployed on 3 to 5 delivery trips daily in Arizona, and soon in Texas. Until the event, I had not realized that anyone had a Level 4 vehicle running autonomous, revenue generating routes on public roads. The TuSimple truck has over a dozen sensors, including cameras which, along with radar and lidar, make up a perception system that “sees” a standard-setting 1000 meters ahead, providing 35 seconds to respond to hazards and obstructions. This is over three times as far as LIDAR can provide and is ahead of Google’s Waymo and Tesla.

Figure 3: The tuSimple truck is a fully autonomous Level 4 hauler that can “see” a full kilometer ahead to provide a safer and more efficient hauler.
 KARL FREUND

Conclusions

While many were disappointed (though not surprised) that no new chips were announced, I, for one, still came away impressed. NVIDIA’s ecosystem appears unstoppable in AI, garnering support in just about every University, every cloud, every server vendor, and every Hollywood post-production studio on the planet. As its position continues to strengthen, and it gets past the Crypto-fed inventory hangover, NVIDIA remains an undisputed leader in what is about to become a far more crowded and competitive market for AI silicon.

Karl Freund is a Moor Insights & Strategy Senior Analyst for deep learning & HPC.

The post NVIDIA GTC 2019: Datacenter Ecosystem, But No New Chips appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: AI And HPC: Cloud Or On-Premises Hosting https://moorinsightsstrategy.com/research-papers/research-paper-ai-and-hpc-cloud-or-on-premises-hosting/ Mon, 04 Feb 2019 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-ai-and-hpc-cloud-or-on-premises-hosting/ Artificial Intelligence (AI) and High-Performance Computing (HPC) are both computationally-intensive workloads. They demand fast central processing units (CPUs), accelerators, very large data sets, and fast networking to support the high degree of scaling typically required. All this fast hardware can be difficult to manage and expensive. AI and HPC adopters must try to minimize costs […]

The post RESEARCH PAPER: AI And HPC: Cloud Or On-Premises Hosting appeared first on Moor Insights & Strategy.

]]>
Artificial Intelligence (AI) and High-Performance Computing (HPC) are both computationally-intensive workloads. They demand fast central processing units (CPUs), accelerators, very large data sets, and fast networking to support the high degree of scaling typically required. All this fast hardware can be difficult to manage and expensive. AI and HPC adopters must try to minimize costs while delivering the performance and agility demanded by the organization’s mission. Chief among the decisions that must be made is whether to build and host the application on a public cloud or build an on-premises infrastructure. While the industry trend is clearly to move new applications to the cloud, AI and HPC workloads have performance, data requirements, and utilization characteristics that could lead one to go in the opposite direction.

You can download the paper by clicking the logo below:

Table Of Contents

  • Introduction
  • The Cloud Computing Landscape For AI And HPC
  • The Dell EMC Computing Portfolio For AI And HPC
  • Common Considerations
  • Conclusions And Recommendations
  • Figure 1: The Dell EMC PowerEdge C4140

Companies Cited

  • AWS
  • Amazon
  • Dell EMC
  • Facebook
  • Google
  • Microsoft
  • NVIDIA
  • RightScale Cloud Management
  • Xilinx

The post RESEARCH PAPER: AI And HPC: Cloud Or On-Premises Hosting appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: The Journey To AI-Enabled SAAS https://moorinsightsstrategy.com/research-papers/research-paper-the-journey-to-ai-enabled-saas/ Thu, 17 Jan 2019 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-the-journey-to-ai-enabled-saas/ Machine Learning (ML) and Artificial Intelligence (AI) will impact practically every application in every industry in the coming few years. Modern applications are increasingly delivered as cloud-based services and as a result, Software-as-a-Service (SaaS) and cloud-based application vendors can now deliver the competitive benefits of AI to their customers. Click on the logo to download […]

The post RESEARCH PAPER: The Journey To AI-Enabled SAAS appeared first on Moor Insights & Strategy.

]]>

Machine Learning (ML) and Artificial Intelligence (AI) will impact practically every application in every industry in the coming few years. Modern applications are increasingly delivered as cloud-based services and as a result, Software-as-a-Service (SaaS) and cloud-based application vendors can now deliver the competitive benefits of AI to their customers.

Click on the logo to download the paper:

Table Of Contents

  • Introduction
  • Taxonomy: Artificial Intelligence (AI), Machine Learning (ML), And Deep Learning (DL)
  • Project Brainstorming: Finding Pots Of Gold (And Avoiding Rat Holes)
  • Organization And Talent Considerations
  • Final Project Selections And Priorities
  • Technology Selection And Acquisition
  • Data Preparation
  • Model Development And Deployment
  • Conclusions And Recommendations
  • Figure 1: The Relationship Between AI, ML, And DL
  • Figure 2: Linear Regression Fits A Line Or A Curve To Data Points Using Sum Of Least Squares Differences
  • Figure 3:  A Typical Deep Neural Network
  • Figure 4: AI Is Ushering In A New Era, Which Many Call Software 2.0
  • Figure 5: AIRI From Pure Storage

Companies Cited

  • Amazon AWS
  • Baidu
  • Coursera
  • Facebook
  • Google
  • Intel
  • Microsoft
  • MIT
  • NVIDIA
  • Pure Storage
  • Telsa
  • Udacity
  • Udemy

The post RESEARCH PAPER: The Journey To AI-Enabled SAAS appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Dell Technologies: Ready For Artificial Intelligence Leadership https://moorinsightsstrategy.com/research-papers/research-paper-dell-technologies-ready-for-artificial-intelligence-leadership/ Tue, 13 Nov 2018 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-dell-technologies-ready-for-artificial-intelligence-leadership/ Artificial Intelligence (AI) is one of today’s fastest growing technologies. Used to derive valuable insights from mountains of data, AI solves problems far more efficiently and accurately than was previously possible with traditional programming techniques. AI is transforming science and businesses around the world from cancer research to virtual assistants to autonomous vehicles. However, many […]

The post RESEARCH PAPER: Dell Technologies: Ready For Artificial Intelligence Leadership appeared first on Moor Insights & Strategy.

]]>

Artificial Intelligence (AI) is one of today’s fastest growing technologies. Used to derive valuable insights from mountains of data, AI solves problems far more efficiently and accurately than was previously possible with traditional programming techniques. AI is transforming science and businesses around the world from cancer research to virtual assistants to autonomous vehicles. However, many enterprises still struggle to capitalize on this trend, and are looking for a vendor that can help them find the right projects and build solutions that deliver business value quickly.  This paper looks at Dell Technologies and concludes that their breadth of solutions and extensive expertise makes Dell a very good partner  for enterprise AI.

You can download the paper here:

 

Table Of Contents:

  • Executive Summary
  • A Practical Approach To The AI Journey
  • AI For Everyone
  • Dell Technologies Portfolio For AI
  • Conclusions And Recommendations

Companies Cited:

  • AeroFarms
  • Dell
  • Dell EMC
  • Dell Technologies
  • MIT Lincoln Labs
  • Mastercard
  • Mastercard Advisors
  • NVIDIA
  • OTTO Motors
  • University Of Cambridge
  • Zenuity

 

 

 

The post RESEARCH PAPER: Dell Technologies: Ready For Artificial Intelligence Leadership appeared first on Moor Insights & Strategy.

]]>
NVIDIA Bets On Turing For Datacenter AI https://moorinsightsstrategy.com/nvidia-bets-on-turing-for-datacenter-ai/ Thu, 27 Sep 2018 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/nvidia-bets-on-turing-for-datacenter-ai/ NVIDIA ’s datacenter business has been on a tear lately, roughly doubling every year for the past several years. It hit $1.93 billion for the 2018 fiscal year, an increase of nearly 130% over the previous year. This growth has been largely driven by the pervasive use of NVIDIA GPUs in HPC and in neural […]

The post NVIDIA Bets On Turing For Datacenter AI appeared first on Moor Insights & Strategy.

]]>
NVIDIA ’s datacenter business has been on a tear lately, roughly doubling every year for the past several years. It hit $1.93 billion for the 2018 fiscal year, an increase of nearly 130% over the previous year. This growth has been largely driven by the pervasive use of NVIDIA GPUs in HPC and in neural network training for Artificial Intelligence research and development.

However, common sense says that at some point, the need to run AI applications will become larger than the demand to build them (assuming these AI tools will indeed be useful). With this in mind, there are now scores of companies, large and small, designing silicon for inference processing, including Google , Intel, Wave Computing, and GraphCore (many of these firms will be presenting their technology on Sept. 18-19 at the inaugural AI HW Summit in Silicon Valley).

Enter the Turing-based Tesla T4 and TensorRT 5 software

When NVIDIA announced the Turing GPU, targeting visualization and real-time rendering, it included some very interesting specs indicating it could make a darned good inference engine. Industry observers have wondered whether NVIDIA GPUs are the right technology to lead this transition to “production AI,” so it was vital for Jensen Huang, NVIDIA’s CEO, to demonstrate the company’s place in inference processing. Not one to disappoint, Mr. Huang announced the new Turing-based Tesla T4 at the GTC-Japan keynote this week—the company’s first GPU to specifically target inference processing in the datacenter.

NVIDIA’s inference platforms to date have been focused on robotics and autonomous driving, such as the Xavier SOC used in DrivePX for autos and in Jetson for robotics (which I covered here). As far as inference processing in the datacenter goes, NVIDIA says its P4 and P40 GPUs have been very popular for AI in the cloud—providing image recognition in video, voice processing, running recommendation engines for eCommerce, and natural language processing for analyzing and translating speech into text. One example NVIDIA shared was Microsoft Bing, which uses these GPUs to power its visual search capability 60 times faster than it could using CPUs. Additionally, each P4 GPU can process 30 simultaneous streams of video running at 30 frames per second.

The new NVIDIA Tesla T4 GPU will effectively replace the P4 and is packaged in a low-profile PCIe card shown in Figure 1. Burning only 75 watts, the new chip features 320 “Turing Tensorcores” optimized for integer calculations popular in inferencing jobs. It can crank out 130 trillion 8-bit integer and 260 trillion 4-bit integer operations per second (or TOPS). If you need floating point operations, such as what is required in neural network training, the T4 can handle 65 TFLOPS for 16-bit calculations—about half the performance of the NVIDIA Volta GPU, while only burning 1/4th the power. The net result is a 2X speedup in processing the video streams I mentioned earlier; while the P4 could handle 30, the T4 can handle 60.

Figure 1: The new NVIDIA T4 GPU for datacenter AI is based on the same Turing architecture NVIDIA recently unveiled for real time ray tracing using AI.    NVIDIA

The software side of the story is based on the 5th release of NVIDIA TensorRT, which provides the preprocessing of the neural network to optimize its execution (branch trimming, sparse matrix optimization, etc.) on the new device, as well as run-time libraries to support the execution. TensorRT 5 also supports Kubernetes containerization, load balancing, dynamic batching, and turnkey resource management to help cloud service providers put these new GPUs into their infrastructure. TensorRT 5 also features support for Google Neural Machine Translation (GNMT).

Conclusions

NVIDIA has been struggling to establish its place in AI inference processing in the datacenter for two reasons:

  1. Inference at scale is just getting started, and much or most of that processing can be handled today by Intel Xeon (or AMD EPYC) CPUs. The primary use case has been for low-resolution still images, such as those uploaded by Facebook users, so there has been little need for the power of a GPU in inference processing.
  2. NVIDIA does not break down its datacenter business by AI vs. HPC vs. Virtual Desktop Infrastructure, much less AI training vs. inference. It can’t or won’t say how many GPUs are already being used for inference.

As more applications for processing streaming video for branding, security, and marketing are developed, the first challenge should fade. Additionally, now that NVIDIA has a dedicated inference GPU, we can hopefully look forward to use cases. Perhaps we’ll even get an indication of the volume of inference processing the company is able to capture.

Finally, I would point out that there are dozens of startups targeting inference, with the potential to match (and maybe exceed) the performance and efficiency of the Tesla T4. Unlike AI training, this will not likely be a one-horse race. For now though, most of these startups only have PowerPoint. NVIDIA now has a real dedicated inference engine to sell.

The post NVIDIA Bets On Turing For Datacenter AI appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Wave Computing: Designed To Scale https://moorinsightsstrategy.com/research-papers/research-paper-wave-computing-designed-to-scale/ Tue, 18 Sep 2018 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-wave-computing-designed-to-scale/ Modern data scientists have an insatiable appetite for more performance to train and run deep neural networks (DNNs) for artificial intelligence (AI). In fact, research by Open.ai has shown DNNs are doubling their performance requirements every three and a half months compared to the traditional Moore’s Law rate for central processing units (CPUs), which have […]

The post RESEARCH PAPER: Wave Computing: Designed To Scale appeared first on Moor Insights & Strategy.

]]>

Modern data scientists have an insatiable appetite for more performance to train and run deep neural networks (DNNs) for artificial intelligence (AI). In fact, research by Open.ai has shown DNNs are doubling their performance requirements every three and a half months compared to the traditional Moore’s Law rate for central processing units (CPUs), which have historically doubled every 18 months. While NVIDIA graphics processing units (GPUs) have largely enabled this advancement, some wonder if a new, grounds-up approach to silicon and system design might be better suited for this task. Given the growth prospects for AI, it’s no surprise there are scores of startups and large companies like Intel readying new silicon to enter the race. Wave Computing(“Wave”) believes their early time to market and novel “data flow” architecture will pavetheir way to success. In particular, Wave’s system design has the potential to improve scalability, which is essential for large model training for AI. This article will look atWave’s architectural foundation for performance and scalability.

You can download the paper here:

 

Table Of Contents

  • Introduction
  • A Dataflow Primer
  • Beyond Dataflow: System-Level Scalability
  • Putting It All Together
  • Conclusions
  • Figure 1: A Typical Neural Network For Deep Learning
  • Figure 2: Slack-Matching Buffers
  • Figure 3: Distributed Agent Management
  • Figure 4: Dataflow Processing Units Interconnected Through Fabric

Companies Cited

  • Broadcom
  • MIPS Technologies
  • MIT
  • NVIDIA
  • Wave Computing

The post RESEARCH PAPER: Wave Computing: Designed To Scale appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: The Artificial Intelligence Starter Guide For IT Leaders https://moorinsightsstrategy.com/research-papers/research-paper-the-artificial-intelligence-starter-guide-for-it-leaders/ Mon, 07 May 2018 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-the-artificial-intelligence-starter-guide-for-it-leaders/ Artificial intelligence (AI) is a transformative technology that will change the way organizations interact and will add intelligence to many products and services through new insights currently hidden in vast pools of data. In 2017 alone, venture capitalists invested more than $11.7 billion in the top 100 Artificial Intelligence startups, according to CB Insights, and […]

The post RESEARCH PAPER: The Artificial Intelligence Starter Guide For IT Leaders appeared first on Moor Insights & Strategy.

]]>

Artificial intelligence (AI) is a transformative technology that will change the way organizations interact and will add intelligence to many products and services through new insights currently hidden in vast pools of data. In 2017 alone, venture capitalists invested more than $11.7 billion in the top 100 Artificial Intelligence startups, according to CB Insights, and the breadth of Artificial Intelligence applications continues to grow. While human-like intelligence will remain the stuff of fantasy novels and movies for the near future, most organizations can and should explore practical Artificial Intelligence projects. This technology has the real potential to:

  • improve productivity of internal applications,
  • increase revenue through enhanced customer interaction and improved customer acquisition,
  • reduce costs by optimizing operations,
  • and enhance products and services with “smart” functionality such as vision and voice interaction and control.

You can download the paper here.

Table Of Contents:

  • Executive Summary
  • Business Drivers For Artificial Intelligence
  • Getting Started
  • Artificial Intelligence, Machine Learning And Deep Learning
  • The Machine Learning Software Ecosystem
  • Dell Technologies Platforms And Assistance For Artificial Intelligence Projects
  • Conclusions And Next Steps
  • Figure 1: The Relationship Between Artificial Intelligence, Machine Learning And Deep Learning

Companies Cited:

  • Amazon AWS
  • Coursera
  • Dell EMC
  • Dell Technologies
  • Google
  • IDC
  • Udacity
  • Udemy
  • Microsoft
  • NVIDIA
  • Oracle
  • SAP
  • SAS

 

 

The post RESEARCH PAPER: The Artificial Intelligence Starter Guide For IT Leaders appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: A Practitioner’s Guide To Artificial Intelligence https://moorinsightsstrategy.com/research-papers/research-paper-a-practitioners-guide-to-artificial-intelligence/ Mon, 07 May 2018 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-a-practitioners-guide-to-artificial-intelligence/ Artificial intelligence (AI) is delivering new insights − previously hidden in vast pools ofdata − to add intelligence to many products and services that ultimately transform the way organizations and machines interact. While human-like intelligence will remain the stuff of fantasy novels and movies for the near future, most organizations should explore incorporating AI into […]

The post RESEARCH PAPER: A Practitioner’s Guide To Artificial Intelligence appeared first on Moor Insights & Strategy.

]]>

Artificial intelligence (AI) is delivering new insights − previously hidden in vast pools ofdata − to add intelligence to many products and services that ultimately transform the way organizations and machines interact. While human-like intelligence will remain the stuff of fantasy novels and movies for the near future, most organizations should explore incorporating AI into their business, products, and IT projects. Our firm’s research concludes that AI can improve productivity of internal applications, increase revenue, reduce costs, and improve products and services with added functionality or communication modes.

You can download the paper here.

Table Of Contents:

  • Objectives Of This Guide
  • Business Drivers For AI
  • Getting Started
  • Artificial Intelligence (AI), Machine Learning (ML) And Deep Learning (DL)
  • The Machine Learning Ecosystem
  • Dell Technologies Platforms And Assistance For AI Projects
  • Conclusions And Next Steps
  • Figure 1- Developing And Using A Deep Neural Network
  • Figure 2- Popular Deep Learning Frameworks

Companies Cited

  • Amazon AWS
  • Clarifai
  • Coursera
  • Dell EMC
  • Dell Technologies
  • Google
  • IDC
  • Intel
  • Microsoft
  • NVIDIA
  • Udacity
  • Udemy

 

 

 

The post RESEARCH PAPER: A Practitioner’s Guide To Artificial Intelligence appeared first on Moor Insights & Strategy.

]]>
Ten Predictions For AI Silicon In 2018 https://moorinsightsstrategy.com/research-notes/ten-predictions-for-ai-silicon-in-2018/ Fri, 05 Jan 2018 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/ten-predictions-for-ai-silicon-in-2018/ 2017 was an exciting year for fans and adopters of AI. As we enter 2018, I wanted to take a look at what lies ahead. One thing is certain: we’ve barely just begun on this journey and there will be great successes and monumental failures in the year to come. Before I dive into the […]

The post Ten Predictions For AI Silicon In 2018 appeared first on Moor Insights & Strategy.

]]>
2017 was an exciting year for fans and adopters of AI. As we enter 2018, I wanted to take a look at what lies ahead. One thing is certain: we’ve barely just begun on this journey and there will be great successes and monumental failures in the year to come. Before I dive into the dangerous waters of predictions, it might be helpful to set the stage with some of the highlights and lowlights of AI of 2017. A lot happened this past year so I will try to keep this brief!

Ten events that shaped the year for AI chips in 2017

  1. NVIDIA  continued to blow the doors off of the wildest expectations for its Data Center Business, churning out triple digit growth to reach a ~$1.5B revenue run rate.
  2. NVIDIA surprised the market with the NVIDIA Volta V100 GPU and cloud services for machine learning, capable of achieving 125 trillion operations per second with TensorCores—6X the performance of its one-year-old PASCAL predecessor.
  3. NVIDIA also surprised the market by announcing its own Deep Learning ASIC, to be included in the company’s next generation DrivePX automotive platform. As promised, the company published the specs as open source technology in Q3.
  4. AMD launched its AI GPU and software, the Vega Frontier Edition.The company announced a few big deployments wins, including Baidu for GPUs and Microsoft Azure for its EPYC CPUs.
  5. Google  announced its own ASIC chip for AI deep learning training, the Cloud TensorFlow Processing Unit, delivering 45 TeraOps per die, and featuring a 4-die 180 TeraOps card for use in its datacenters and cloud services. This announcement fueled much speculation regarding the threat that ASICs may present to NVIDIA ’s dominance.
  6. Microsoft announced impressive results for its internal use of Intel  Altera FPGAs for Machine Learning and other applications. This heightened the expectations for Xilinx  in the datacenter. Speaking of which…
  7. Amazon.com AWS announced AWS Marketplace Solutions for its Xilinx-powered F1 instances for application acceleration (for Video, Genomics, Analytics and Machine Learning). Baidu , Huawei , and others also jumped on the Xilinx FPGA bandwagon.
  8. Intel missed milestones for the production release of the Nervana Engine, which the company acquired in 2016.
  9. Intel canceled the Knights Hill Xeon Phi chip, either because the standard Xeon processor was so good, and/or because the company plans to shift its AI efforts to Nervana. There’s no doubt in my mind that the significant savings in development expenses was the ultimate decision driver.
  10. Finally, the number of ASICs being developed for AI to challenge NVIDIA has grown dramatically, including half-dozen Chinese startups (presumably with government backing), a half-dozen US VC-funded companies, and several other large companies (including Qualcomm , Huawei , and Toshiba .

Ten 2018 predictions for AI silicon

Now that we’re all caught up, let’s move on to the predictions. I will couch these in terms of High, Medium, and Low probability just to hedge my bets.

  1. Google will announce public availability of its TPU in the Google  Compute Cloud, along with new API and tool services to better compete with Microsoft and Amazon for Machine Learning as a Service. (HIGH probability)
  2. Intel  will finally bring out the Nervana Engine, probably in Q2 or perhaps Q3. The company simply cannot wait any longer to establish relevancy in this hyper-growth market especially after the cancellation of KNH. However, I doubt Intel will exploit the chip’s on-die fabric since it wants to sell as many Xeons as possible—I sincerely hope for Intel ’s sake I am wrong about this latter point. (HIGH probability)
  3. NVIDIA will pre-announce the chip that follows Volta. Since Volta is so new and remains way in front of any chip out there, look for this to be announced at SC’18 in November, instead of GTC in March. (MEDIUM probability)
  4. Xilinx will win at least one high-profile customer for AI inference, although I do not think it will be Microsoft. (HIGH probability)
  5. While 2017 was the year of AI in the Data Center, 2018 will see a surge of AI at the edge, with IoT and other edge applications building momentum. This will be critical for NVIDIA , as it needs to grow at the edge to maintain its leadership pace. (MEDIUM probability)
  6. Although Dell , HPE, and Lenovo have all brought forward new infrastructure to support AI, the adoption of AI in the enterprise will continue to lag until 2019 or later. (HIGH probability)
  7. Someone will buy at least one ASIC startup, such as Wave Computing, Cerebras, or Groq. Odds are higher that the acquirer will be Dell or Hewlett Packard Enterprise, seeing as the systems business model is more in-line with OEMs than NVIDIA or Intel . (MEDIUM probability)
  8. NVIDIA will bring out a full-fledged ASIC product (not just DLA logic for open source) for Machine Learning. I would rate this as LOW probability for 2018, since I do not believe NVIDIA will feel threatened by ASICs like Google TPU until 2019. That being said, CEO Jensen Huang is not one to wait for threats to materialize before he acts.
  9. At least one of the large Chinese cloud providers ( Baidu, Tencent, or Alibaba ) will buy one of the many Chinese ASIC startup vendors late in 2018. (MEDIUM Probability)
  10. While AMD ’s EPYC CPU will gain significant traction in the datacenter, the company will struggle to establish meaningful (double-digit) market share in GPUs for AI. The company’s high-end Vega GPU is still a generation behind NVIDIA Volta, and it takes time to establish an ecosystem. AMD will be very focused on getting its APUs to market in 2018. (HIGH Probability)

Well, that wraps it up. Feel free to post your own thoughts, critiques, etc., on this site! Happy New Year!

The post Ten Predictions For AI Silicon In 2018 appeared first on Moor Insights & Strategy.

]]>
RESEARCH PAPER: Synthesis Modeling: The Intersection Of HPC And Machine Learning https://moorinsightsstrategy.com/research-papers/research-paper-synthesis-modeling-the-intersection-of-hpc-and-machine-learning/ Tue, 14 Nov 2017 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/research-paper-synthesis-modeling-the-intersection-of-hpc-and-machine-learning/ Historically, numerical analysis has formed the backbone of supercomputing for decades by applying mathematical models of first-principle physics to simulate the behavior of systems from subatomic to galactic scale. Recently, scientists have begun experimenting with a relatively new approach to understand complex systems using machine learning (ML) predictive models, primarily Deep Neural Networks (DNN), trained […]

The post RESEARCH PAPER: Synthesis Modeling: The Intersection Of HPC And Machine Learning appeared first on Moor Insights & Strategy.

]]>

Historically, numerical analysis has formed the backbone of supercomputing for decades by applying mathematical models of first-principle physics to simulate the behavior of systems from subatomic to galactic scale. Recently, scientists have begun experimenting with a relatively new approach to understand complex systems using machine learning (ML) predictive models, primarily Deep Neural Networks (DNN), trained by the virtually unlimited data sets produced from traditional analysis and direct observation. Early results indicate that these “synthesis models” combining ML and traditional simulation, can improve accuracy, accelerate time to solution and significantly reduce costs.

You can download the paper on the NVIDIA website.  Click here.

Table Of Contents

  • Introduction
  • Applying Machine Learning In HPC
  • Three Approaches To Applying Machine Learning In HPC
  • Machine Learning Use Cases In HPC
  • Application Of Model Modulation At ITER
  • Conclusions
  • Figure 1: The Synthesis Of Numerical Analysis And Machine Learning Can Create New Predictive Simulation Models
  • Figure 2: Machine Learning Can Improve Neutrino Detection By Combining Simulation Results From Different Models To Produce A Superior Model
  • Figure 3: Bose-Einstein Condensate Achieved Convergence After Only 10-12 Experiments Using Machine Learning, Compared To 140 Experiments Using the Traditional Approach
  • Figure 4: Example Use Cases For Synthesis Modeling
  • Figure 5: Spinning Black Holes Create Gravitational Waves, Ripples In The Fabric Of Space And Time.  Machine Learning Is Now Enhancing Our Understanding Of These Phenomena

Companies Cited

  • Caltech
  • Deep Neural Networks (DNN)
  • Fermilab
  • Laser Interferometer Gravitational Wave Observatory (LIGO)
  • NVIDIA
  • National Center For Supercomputing Applications (NCSA)
  • University Of Florida
  • University Of North Carolina
  • University Of South Wales

 

 

 

The post RESEARCH PAPER: Synthesis Modeling: The Intersection Of HPC And Machine Learning appeared first on Moor Insights & Strategy.

]]>
Microsoft: FPGA Wins Versus Google TPUs For AI https://moorinsightsstrategy.com/microsoft-fpga-wins-versus-google-tpus-for-ai/ Mon, 28 Aug 2017 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/microsoft-fpga-wins-versus-google-tpus-for-ai/ The Microsoft Brainwave mezzanine card extends each server with an Intel Altera Stratix 10 FPGA accelerator, synthesized to act as a “Soft DNN Processing Unit,” or DPU, and a fabric interconnect that enables datacenter-scale persistent neural networks. At the recent Hot Chips conference, three of the world’s largest datacenter companies detailed projects that exploit Field […]

The post Microsoft: FPGA Wins Versus Google TPUs For AI appeared first on Moor Insights & Strategy.

]]>
The Microsoft Brainwave mezzanine card extends each server with an Intel Altera Stratix 10 FPGA accelerator, synthesized to act as a “Soft DNN Processing Unit,” or DPU, and a fabric interconnect that enables datacenter-scale persistent neural networks.

At the recent Hot Chips conference, three of the world’s largest datacenter companies detailed projects that exploit Field Programmable Gate Arrays (FPGAs) as accelerators for performance-hungry datacenter applications, particularly for Machine Learning. While Xilinx and Intel (Altera) have long talked about the potential for their technologies to change the datacenter landscape, broad adoption has remained elusive, in part due to the challenges of FPGA development. Specifically, Amazon, Baidu, and Microsoft all announced technologies and initiatives that they hope will address those barriers to adoption and enhance their own AI services. Being a bit of a hardware geek, I will focus primarily here on the implications of Microsoft’s technology. However, it is worthwhile to also consider the Baidu and Amazon announcements; taken together, I believe these announcements may bode well for FPGA adoption in the datacenter, which in turn could pave the way for growth for Intel and especially Xilinx (as a pure FPGA play). “When the industry’s brightest spotlight turned to them at HotChips this week, Amazon, Baidu, and Microsoft all chose to talk about innovations in FPGA-based acceleration for their data centers,” said Steve Glaser, senior vice president of Corporate Strategy at Xilinx. “It is clear that momentum for FPGA acceleration in hyperscale data centers continues to be on the rise.

First, Baidu announced a new architecture they hope could broaden the use of FPGA’s as an acceleration platform.  The new Baidu “XPU” combines a CPU, GPU, and FPGA in a flexible configuration on a Xilinx FPGA, which they hope will be easier to program than traditional low-level techniques developers use today for FPGAs.  For their part, Amazon Web Services provided an update on their progress with the F1 acceleration platform, which supports an 8-node Xilinx-equipped EC2 instance to enable the FPGA acceleration application development.

Figure 1:  The Microsoft Brainwave mezzanine card extends each server with an Intel Altera Stratix 10 FPGA accelerator, synthesized to act as a “Soft DNN Processing Unit,” or DPU, and a fabric interconnect that enables datacenter-scale persistent neural networks.  Source: Microsoft.

What did Microsoft announce?

While Amazon and Baidu are working to render FPGA’s more accessible and easier to program on their clouds, Microsoft is perhaps the largest end-user of FPGAs for datacenter applications, accelerating a wide swath of their massive computing infrastructure and applications on Bing and Azure.  To demonstrate their resulting prowess, Microsoft unveiled Project Brainwave, a scalable acceleration platform for deep learning, which can provide real time responses for cloud-based AI services.  Microsoft had previously announced some 29 of these AI APIs, lowering the barriers to adoption for enterprises looking to get on board the AI bandwagon. Now Microsoft is sharing details about the hardware infrastructure upon which these MLaaS APIs and Bing internal services are built.

Microsoft’s Project Brainwave consists of three components:

    1. A high-performance systems architecture that pools accelerators for datacenter-wide services and scale. By linking their accelerators across a high bandwidth, low-latency fabric, Microsoft can dynamically allocate these resources to optimize their utilization while keeping latencies very low.
    2. A “soft” DNN processor (DPU) that is programmed, or synthesized, on 14nm class Altera FPGAs. More on this below.
    3. A compiler and run-time environment to support efficient deployment of trained neural network models using CNTK, Microsoft’s DNN platform. Similar to the case of Google’s TPU and TensorFlow, Microsoft requires a hardware platform that is optimized for their own Interestingly, Microsoft has claimed that CNTK can have significant performance advantages over TensorFow, especially for recurrent neural networks used for natural language processing. It is not clear the extent to which Brainwave further enhances CNTK performance.

As I’ve recently explored, a fully custom chip, or ASIC, can give companies like Google a very fast machine learning accelerator at lower per-unit costs, but the development process can be cost-prohibitive, lengthy, and result in a fixed function chip, impeding one’s ability to quickly adapt silicon implementations as algorithms evolve. Microsoft pointed to this tradeoff in their announcement as a primary driver for their FPGA-based strategy. By using an FPGA instead of an ASIC for their “soft” DPU, Microsoft believes it can better optimize their hardware for their software at lower cost and with greater flexibility over time.

A great example of the advantage of FPGAs in machine learning is the ability to customize the level of precision required for a particular layer in a deep neural network.  NVIDIA pioneered the use of reduced precision calculations in the Pascal and Volta GPUs (both of which support 16-bit floating point and 8-bit integer calculations). However, why stop there? Think about the requirements of a neural network layer that is determining someone’s sex from an image. This attribute requires just 2 bits: male, female, or other (unknown).  Moreover, with an FPGA, a neural net designer could model each layer in the net with the optimal (minimal) number of bits, which can have a significant impact on performance and efficiency, as the graph below demonstrates. The reference to LSTM, or Long Short Term Memory, is a class of machine learning often used for natural language processing, one of Microsoft’s fortes. (The astute reader will note that Microsoft did not share FP16 results, which would undoubtedly be lower than the 16-bit integer results. However, these chips are not designed for training a neural network, for which an NVIDIA Volta GPU can deliver up to 120 Tera-operations/second for the 16/32 bit operations needed in training.)

Figure 2: Microsoft’s DPU can be programmed to process calculations for virtually any precision required by the neural network being used, delivering excellent performance. Also, Microsoft can reprogram (synthesize) these chips in a matter of weeks for a different use case.  Source: Microsoft.

Microsoft’s DPU can be programmed to process calculations for virtually any precision required by the neural network being used, delivering excellent performance. Also, Microsoft can reprogram (synthesize) these chips in a matter of weeks for different use case.

Conclusions

The field of Machine Learning requires blazingly fast chips for acceleration, and we see just the beginning of innovations in this area as I outlined in an earlier blog posting. While Google has taken the ASIC path, Microsoft has demonstrated that they can achieve comparable and in some cases even better results using an FPGA, which enables them to continually track innovations in software with their custom hardware. Meanwhile, Xilinx, Baidu, and Amazon are working together to lower, if not completely remove, the traditional barriers to FPGA adoption.  Taken together, these initiatives point to increased opportunities for other large AI consumers and providers to have their cake and eat it too; they can optimize custom chips for their applications while avoiding the cost and potential technology obsolescence of going down the custom ASIC approach. However, we have only scratched the surface of this deep well of innovation.

 

 

 

The post Microsoft: FPGA Wins Versus Google TPUs For AI appeared first on Moor Insights & Strategy.

]]>
Microsoft Finds Its AI Voice https://moorinsightsstrategy.com/microsoft-finds-its-ai-voice/ Mon, 17 Jul 2017 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/microsoft-finds-its-ai-voice/ Microsoft held an intimate analyst and press event in London this week, coincident with the 20th anniversary of the founding of the Cambridge Research Lab, the European hub led by Professor Christopher Bishop. Microsoft now employs over 7,000 Artificial Intelligence (AI) research scientists and development engineers around the world under Microsoft Research (MSR) Executive VP […]

The post Microsoft Finds Its AI Voice appeared first on Moor Insights & Strategy.

]]>
Microsoft held an intimate analyst and press event in London this week, coincident with the 20th anniversary of the founding of the Cambridge Research Lab, the European hub led by Professor Christopher Bishop. Microsoft now employs over 7,000 Artificial Intelligence (AI) research scientists and development engineers around the world under Microsoft Research (MSR) Executive VP Harry Shum, who shared the company’s vision and strategy during the keynote address. The event was clearly intended to promote Microsoft as a thought leader in the developing science and technology of AI, as the execs focused on the company’s lofty goals and initiatives, sprinkled with customer and product examples that showcased the company’s innovations.

Head of Microsoft Research and AI Harry Shum kicked off the intimate event in London. )Source: Microsoft)

While companies like Facebook, Google, Baidu and Tesla tend to grab more attention and headlines for their AI efforts, Microsoft has been quietly building an impressive portfolio of scientific advances, products, features and AI cloud services. However, during this and other recent corporate events, Microsoft has been more aggressively positioning itself, and indeed AI, as a force for good, where machines will be used to “amplify human ingenuity”, making the world a better place. It appears to me that Microsoft is taking great care to avoid the appearance of exploiting AI at the expense of our collective humanity. In fact, CEO Satya Nadella himself set the company on this virtuous path when he announced a set of ethical principles last year to guide the company’s AI strategy. Throughout this event, the company’s executives focused “fostering efforts that lie at the intersection of AI, people and society”, taking a feel-good approach that may lessen customers’ fears of a Hal9000 AI nightmare future, positioning the company as a trusted advisor and provider of practical AI tools, products and services.

What Did Microsoft Announce?

As an example of the influence and impact of the massive Microsoft Research organization, the company announced a new initiative called AI for Earth, a program aimed at empowering people and organizations to solve global environmental challenges by improving access to AI tools, education and skills to accelerate innovation. Microsoft also announced the formation of Microsoft Research AI, a new team of scientists within MSR, focused on meeting the computational challenges in AI. Finally, the company also announced a new partnership between MSR Cambridge and the University of Amsterdam to foster the development of machine reading research and development.

On a more practical note, Microsoft’s overarching goal of “democratizing AI” was brought down to earth with product proof points, including a new iOS app (yes, iOS!) called Seeing AI, which narrates the world around you using images captured from your iPhone’s camera, designed for the vision-impaired. Back on the mundane desktop, a new Presentation Translator for PowerPoint, first unveiled at Microsoft Build, enables speakers to engage in simultaneous 2-way translation while giving a presentation, with automatic subtitles, and improving translation accuracy by applying knowledge gleaned from the slides’ content. While the demo was impressive, I suspect it will take some time before a presenter would trust an AI to translate their sales pitch. Another cool tool is the ability of Microsoft’s digital assistant, Cortana, to extract and track commitments made to others in emails and text messages, even if communicated on non-Microsoft platforms. Cortana can then provide timely reminders to set up a meeting or send a promised report to your boss and colleagues. And for AI developers, Microsoft further extended its impressive Cognitive Toolkit with a new Bing Entity Search API, allowing developers to tap into a pre-trained Deep Neural Network built from Bing searches for entities. This is yet another example where Microsoft has made it possible for enterprises to use, or experiment with, trained neural networks for AI tooling, potentially easing adoption of Microsoft AI technologies in risk-averse, and resource challenged, enterprise IT organizations.

Conclusions

If you’re familiar with those infamous corporate HR exercises designed to foster employee alignment with a company’s Mission, Vision and Values, you have surely heard of a BHAG, a Big Hairy#@! Goal. Well, Nadella’s Microsoft has set the mother of all BHAG’s for the company’s AI mission. While most AI discussions center on training machines to think, Microsoft is more concerned with how we can train and aid humans to augment our understanding and capabilities. This is certainly a lofty and noble goal that few companies would have the courage, or resources, to tackle. But the company is also delivering on the practical technologies needed to be an attractive partner and supplier to enterprises seeking to catch the AI wave. And by lowering the barriers to adoption of AI for its massive installed base of enterprise clients using Outlook, Office 365, Dynamics, Skype and LinkedIn, Microsoft is well positioned to reach a revenue inflection point and gain a competitive advantage as the world increasingly turns to AI.

The post Microsoft Finds Its AI Voice appeared first on Moor Insights & Strategy.

]]>
Google’s TPU For AI Is Really Fast, But Does It Matter? https://moorinsightsstrategy.com/googles-tpu-for-ai-is-really-fast-but-does-it-matter/ Thu, 13 Apr 2017 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/googles-tpu-for-ai-is-really-fast-but-does-it-matter/ After nearly a year since the introduction of the Google TensorFlow Processing Unit, or TPU, Google has finally released detailed performance and power metrics for its in-house AI chip. The chip is impressive on many fronts, however Google understandably has no plans to sell it to its competitors, so its impact on the industry is debatable. […]

The post Google’s TPU For AI Is Really Fast, But Does It Matter? appeared first on Moor Insights & Strategy.

]]>
After nearly a year since the introduction of the Google TensorFlow Processing Unit, or TPU, Google has finally released detailed performance and power metrics for its in-house AI chip. The chip is impressive on many fronts, however Google understandably has no plans to sell it to its competitors, so its impact on the industry is debatable. So, who really benefits, and who is potentially exposed to incremental risk, by this ninja chip for AI? I think the answer is everyone, and no one, respectively. Here’s why.

What is a TPU and how does it stack up?

The challenge Google was facing a few years ago was that it foresaw a dramatic shift in its computing needs towards supporting Machine Learning workloads. These applications are profoundly compute intensive, and continuing to use (Intel) CPUs was cost prohibitive and would not meet its needs for rapid response times across millions of simultaneous users and queries. Google was using NVIDIA GPUs for training the underlying neural networks that allow machines to recognize patterns in the data and using x86 CPUs to then execute the queries across the neural network, called inferencing. While large GPUs for training are fairly expensive, the larger volume of work would be in these inference engines. So, Google decided to develop a chip that could handle this workload at a lower cost, with higher performance, while consuming far less power.

Google’s TPU sits on a PCIe Card and fits in a standard disk drive bay. You can have multiple TPUs per server.

Google has recently released extensive architectural details and performance data that show the fruits of its labor. Understandably, it compared the TPU with the generation of NVIDIA and Intel chips that it had at its facility at the time; Intel’s Haswell is 3 generations old and the NVIDIA Kepler was architected in 2009, long before anyone was using GPUs for machine learning. Now NVIDIA CEO Jensen Huang has been kind enough to provide updated comparisons to NVIDIA’s latest generation of chips, based on NVIDIA PASCAL. Comparing current generation chips makes a huge difference, as NVIDIA’s deficit of yielding only 1/13th the performance of the TPU turns into a 2X advantage for NVIDIA, albeit at 3x the power consumption.

Comparing NVIDIA GPUs vs. the Google TPU in performance and power consumption. (Source: NVIDIA)

These two approaches produce very different results. The P40 has strong floating point, useful in training, and greater memory bandwidth. The TPU screams at 90 trillion operations per second, nearly twice that of the GPU, and consumes only 1/3rd the power. Keep in mind that the GPU being measured is just one instantiation of the PASCAL architecture; NVIDIA is able to productize a single architecture to address many distinct markets, including gaming, Machine Learning (ML training and inference), automotive and supercomputing. The GPU is a programmable device and as such is a general-purpose accelerator. The TPU, on the other hand, is designed to done one thing extremely well: multiply tensors (integer matrices) in parallel that are used to represent the (deep) neural networks used in Machine Learning for AI.

But the relative performance of these two chips is not really the important point. What I find more important is the proof Google provides that any serious work being done on AI requires serious acceleration, either by a GPU, an FPGA, an ASIC or perhaps a many-core CPU, all of which will need to be at least 2 orders of magnitude faster than a traditional (Intel Xeon) server CPU if AI is to be affordable and responsive. The other important point is that processing Machine Learning is a sufficiently large and vital workload for Google that it is investing in its own custom silicon to optimize its  datacenters for ML. And contrary to many opinions expressed on various blogs, I do not believe this is a one-and-done event for Google.
Winners and losers

So, who benefits from the TPU, and who might be hurt by it? Users of Google Machine Learning services will directly benefit as more services move over to run on TPU; Google has lowered the price of selected services by as much as 6x, directly attributing the savings to the TPU. So, Google wins by having a more competitive platform for internal use and cloud ML services and by saving on its CAPEX and power consumption for its massive datacenters.

Does the TPU represent a risk to silicon vendors such as Intel and NVIDIA? I think not, at least not directly and not immediately. First, most inference work today is done by Intel Xeon CPUs in the datacenter and ARM CPUs at the edge and is deployed at a more modest scale than seen at Google. And Google is still using NVIDIA GPUs for training its neural networks. So it is not like the TPU took a big chunk out of NVIDIA’s business, if any. Intel wouldn’t have been able to deliver the performance Google needed, so this is a case of giving up sleeves out of its vest. (Note that the TPU is still an accelerator hanging off an Intel Xeon server.)

Second, consider that the TPU is only available to Google’s internal data scientists and to users of Google’s AI cloud services. Google Cloud remains a distant third to Amazon Web Services and Microsoft Azure, both of whom offer NVIDIA GPUs in their cloud services for Machine Learning applications. Looking ahead, I would not be surprised to see Google develop a training chip at some point to realize further cost saving for its growing AI portfolio. But again, that would only impact Google’s purchases for its own use, not the purchases by the other 6 of the world’s largest datacenters (Amazon, Alibaba, Baidu Facebook, Microsoft and TenCent). These guys will all continue to purchase GPUs and FPGAs for their acceleration workloads, until and unless a better alternative comes along.

Given the rapid market growth and thirst for more performance, I think that is inevitable that silicon vendors will introduce chips designed exclusively for Machine Learning. Intel, for example, is readying the Nervana Engine technology they acquired last August, most likely for both training and inference. And I know of least four startups, including Wave Computing, NuCore, GraphCore and Cerebras that are likely to be developing customized silicon and even systems for Machine Learning acceleration. Certainly, more competition and alternatives in this space will fuel more adoption and innovation, which benefits everyone in the market.

As for the market leader, NVIDIA won’t likely be left in the dust. NVIDIA can also incorporate new techniques in its hardware specifically for Machine Learning, and it can continue to optimize its software ecosystem to keep pace. Just last year, NVIDIA set the new standard for reduced precision matrix operations for 16-bit floating point and 8-bit integer values (for training and inference, respectively). All other silicon vendors, with the notable exception of Xilinx, are at least a year behind NVIDIA in adopting this approach, which can double or quadruple performance and power efficiency. Finally, NVIDIA’s NVLINK interconnect is still the only viable contender to support strong scaling of cooperating accelerators. (IBM OpenCAPI is the sole alternative, and even IBM supports both.)

 Conclusions

Google is a world leader in developing and using Machine Learning algorithms and hardware in its vast Internet search operations and cloud service offerings. It uses it for everything from Google Translate, which supports over 100 languages, to Google Now, to building an AI that beat the world champion at GO. So it makes sense that it would want to invest in customized hardware that can deliver the most performance for its software. The performance and architectural details it has recently shared demonstrates its prowess in designing ASICs to accelerate machine learning, and it is likely that its TPU presages other designs that will further challenge the status quo. I am certain that the other large internet datacenters will do the math to evaluate the ROI of similar efforts for their own use, but for now I suspect they may not currently have the scale required to justify the development investment of perhaps $100M a year. But you can be sure that the machine learning and AI market is still in its infancy, and we will see many innovations in hardware and software in the coming years.

The post Google’s TPU For AI Is Really Fast, But Does It Matter? appeared first on Moor Insights & Strategy.

]]>
Intel Touts Manufacturing & Technology Leadership: Moore’s Law Is Alive And Well https://moorinsightsstrategy.com/intel-touts-manufacturing-technology-leadership-moores-law-is-alive-and-well/ Thu, 30 Mar 2017 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/intel-touts-manufacturing-technology-leadership-moores-law-is-alive-and-well/ For quite some time now, many chip industry watchers have questioned the future sustainability of Moore’s Law. Named after Intel founder Gordon Moore, this industry constant observes that semiconductor densities double every 2 years, and has underpinned the industry for 52 years. Recently, some have also begun to question whether Intel has lost its vaunted […]

The post Intel Touts Manufacturing & Technology Leadership: Moore’s Law Is Alive And Well appeared first on Moor Insights & Strategy.

]]>
For quite some time now, many chip industry watchers have questioned the future sustainability of Moore’s Law. Named after Intel founder Gordon Moore, this industry constant observes that semiconductor densities double every 2 years, and has underpinned the industry for 52 years. Recently, some have also begun to question whether Intel has lost its vaunted technology and manufacturing leadership as its fabrication competitors, Taiwan Semiconductor(TSMC) and Samsung Electronics , ready their new 10nm generation products for production in the same time window as Intel, planned for 2H 2017. Intel decided to set the record straight by hosting a 5-hour marathon deep dive on its Technology and Manufacturing Group (TMG), and it left little doubt that both concerns are vastly overstated. In addition to sharing data that equated its 14nm products, which have been shipping for 3 years now, to its competitors’ upcoming 10nm products, Intel made several important technology announcements that further demonstrate the company’s innovation and leadership.

Intel’s New Math: 14 = 10 (so 10 = 7, right?)

Intel took several hours to articulate its position that the company has not lost its advantage, and that Moore’s Law is alive and well. Put simply it presented evidence that Intel’s innovations in its existing 14nm manufacturing process yield chips with densities that rival what it expects from its competitors’ upcoming 10nm process. Intel then showed how its 10nm process, available later this year, will maintain this 3 year advantage. Which means, it contends, that its 10nm process will rival its competitors’ 7nm process in the subsequent generation. Note that Intel’s arguments are based on a density metric measuring the density of transistors in 2 specific logic blocks, a simple NAND gate and a complex flip flop. Some would argue that this approach does not adequately characterize a chip’s performance, power and density characteristics, but it seems to me like a reasonable proxy.

So, what makes Intel’s 14nm better than that of TSMC or Samsung’s 16nm FinFET (also used by Globalfoundries )? First, Intel has developed a set of “Hyper Scaling” technologies unique to Intel that account for its superior PPA (Performance, Power, Area) attributes (more on that later). Intel also points to a lithography technology called Self Aligned Dual Patterning, in part to explain why its 14nm equals others’ 10nm, and to the follow-on Self Aligned Quad Patterning to sustain its ability to better scale its 10nm parts.

While one needs to be cautious that Intel cannot know with certainty what its competitors are or are not doing when it draws these conclusions, its argument is fairly compelling. The figure below shows how Intel stacks up in terms of logic density over time, projecting the future 10nm densities based on what Intel would have achieved without its special sauce described above.

Intel’s projection of competitor’s 10nm densities in red compared to Intel’s existing 14nm products in blue.

Intel went on to describe a portfolio of technologies that it is baking into its 10nm products. The figure below details these hyper scaling innovations, which when combined produce a 2.7 fold improvement in transistor density.

Intel’s 10nm Hyper Scaling technologies. (Source: Intel)

The figure below shows how Intel 10nm compares with its prior process nodes, as well as where Intel believes its competitors’ 10nm parts will land. The punch line here is that Intel projects that it will maintain a 2x leadership in transistor density in its 10nm products vs. the competition. Basically, Intel is saying that the competition does not have comparable technology with its 10nm hyper scaling. But Intel will need to keep up its blistering pace of innovation, or these advantages are likely to be temporary; recall how the competition copied Intel’s 3D FinFET (Fin Field Effect Transistor) advantage in just one generation.

Intel’s projected logic transistor densities in blue would yield a 2x advantage over its unnamed competitors

So, when you net it all out, Moore’s Law seems to be safe, at least for the next couple generations. Once again, there is more to performance and power than just density and resulting die area, so we will all have to await actual 10nm production silicon from the four remaining advanced process fabs—Intel, TSMC, Samsung and Globalfoundries (which uses Samsung’s technology)—to validate these projections.

Intel Announces New Process and Packaging Technologies

Intel announced a couple of major new technologies at the event. The first is an update to its 22nm technology and manufacturing process called 22FFL to significantly reduce power consumption. Here, Intel added FinFET technology to produce parts that will offer a 100x reduction in total transistor power leakage, targeting mobile and IOT applications for power-restricted environments. However, note that Globalfoundries has taken the early lead here two years ago with its 22FDX FD-SOI process for low power mobile devices, believing that the FD-SOI approach is superior to FinFET for mixed digital and RF applications. In fact, Global has already engaged over 50 customers in mobile, IoT and automotive projects.

Intel claims that its new 22FFL process reduces transistor leakage by 100 fold. (Source: Intel)

Finally, Intel talked about its approach to interconnect chips using silicon embedded in the substrate of the multi-chip package instead of the slower interposer technology typically used today. With a name only a geek could love, the “EMIB” (Embedded Multi-Die Interconnect Bridge) can interconnect chips manufactured on different process nodes, without the through-silicon vias and backside interposers, which could be a boon for lowering costs, increasing performance and lowering latencies (the trifecta for chips). It turns out that advanced process nodes such as 10nm FinFET is great for performance hungry cells such as processor cores but is overkill for things like I/O and communication. Intel is already using EMIB in its upcoming Stratix10, which combines a Xeon processor and a 16nm Altera FPGA on the same package. One could imagine a wide range of applications, including interconnecting CPUs and GPUs, or tying together a Xeon with the future Nervana Engine for AI. This would be a significantly faster solution than the typical PCIe interconnect used today for accelerators, which only delivers 15GB/s Intel said it is realizing 600GB/s bandwidth through the Stratix10 EMIB connection, which is roughly 6 times the throughput of other approaches.

Intel’s EMIB embeds silicon interconnects directly into the substrate, eliminating the traditional interposer

Conclusions

The Technology and Manufacturing day’s intent was not just to explain why Intel chips would be superior products. The company wanted to make it very clear why Intel should be the preferred foundry for chip designs demanding advanced process node manufacturing, and it told a very convincing story. Intel showed that it has not lost its lead in advanced process technologies, and that it can sustain Moore’s Law progression for the foreseeable future. And while EMIB is not new, it is another reason Intel hopes that it can lead as a custom foundry for the industry’s fabless chip companies’ business. As the world continues to digitize everything we interact with, Intel’s competitors increasingly become TSMC and Samsung, not just other chip companies. And based on the information shared at this event, Intel looks to be in excellent position going forward.

The post Intel Touts Manufacturing & Technology Leadership: Moore’s Law Is Alive And Well appeared first on Moor Insights & Strategy.

]]>
NVIDIA Scores Yet Another GPU Cloud For AI With Tencent https://moorinsightsstrategy.com/nvidia-scores-yet-another-gpu-cloud-for-ai-with-tencent/ Fri, 24 Mar 2017 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/nvidia-scores-yet-another-gpu-cloud-for-ai-with-tencent/ NVIDIA’s speedy GPUs and Machine Learning software have unquestionably become the gold standard for building Artificial Intelligence (AI) applications. And today, NVIDIA added TenCent to their list of cloud service providers that offer access to NVIDIA hardware in their clouds for AI and other compute intensive applications. This marks a significant milestone in the global […]

The post NVIDIA Scores Yet Another GPU Cloud For AI With Tencent appeared first on Moor Insights & Strategy.

]]>
NVIDIA’s speedy GPUs and Machine Learning software have unquestionably become the gold standard for building Artificial Intelligence (AI) applications. And today, NVIDIA added TenCent to their list of cloud service providers that offer access to NVIDIA hardware in their clouds for AI and other compute intensive applications. This marks a significant milestone in the global accessibility of the hardware needed to build AI applications, from drones to medical devices to automated factories and robots.

TenCent (whose Chinese name roughly translates to “Soaring Information”) is one of China’s largest Internet companies and the world’s largest gaming platform, having recently announced 2016 revenues that grew by 48% to $21.9B. Many companies, perhaps most, opt to access GPUs in the cloud instead of buying and deploying the hardware directly. AI startups alone are a big market; over 2300 investors have now funded over 1700 startups, according to data compiled by Angel List, and the vast majority of these cash-conscious firms use cloud based NVIDIA GPUs to develop their innovative products and services. The exception is the world’s largest datacenters, aka the “Super Seven” (Alibaba, Amazon, Baidu, Google, Facebook, Microsoft, and Tencent) whose server farms probably crunch proprietary machine learning algorithms with thousands, or even tens of thousands, of speedy GPUs, 100% of which bear the bright green NVIDIA logo. (Advanced Micro Devices, the “other” GPU provider, has intentions to enter this market with a Vega GPU later this year, but does not yet offer optimized AI accelerators that can compete with NVIDIA’s Pascal.)

With this announcement, NVIDIA is now able to claim that every significant cloud service provider is a customer and supplier of GPUs as a service, including Amazon, Google, IBM Softlayer, Microsoft, Allyun (Alibaba Cloud), and Nimbix. And using GPUs in the cloud is the easiest and most cost effective on-ramp to building applications that require acceleration. “There’s been very strong demand for our GPUs in the cloud with consumption models expanding quickly as the AI era takes hold,” said Ian Buck, NVIDIA VP and GM of Accelerated Computing. “Companies everywhere in the world are increasingly turning to the cloud to develop and host their most demanding workloads, particularly for deep learning, inference, high-performance computing and advanced analytics.”

But Tencent didn’t just throw a bunch of GPUs into servers and call it a day. They built custom servers that support up to 8 high-end NVIDIA Tesla P100 (Pascal) GPUs interconnected with NVLINK, the first major cloud provider to do so. While Tencent did not disclose details on the server design, it is likely that Tencent is the 1st cloud provider to adopt and deploy one of the two newly announced open chassis designs (Big Basin and HGX-1) from the Open Compute Program (OCP), designs led by Facebook and Microsoft respectively.

The Microsoft/NVIDIA HGX-1 GPU chassis scales up to 8 GPUs, interconnected with NVIDIA’s NVLINK technology.

It is also important to note that Tencent’s adoption of the Pascal-based P100 is one of the first cloud properties to offer access to this generation of GPU technology. Other cloud providers have kept PASCAL for their own internal development teams, only offering older Maxwell and Kepler generations as cloud accessible GPUs. I expect other cloud providers to fall quickly in line, since the Pascal architecture is now stable, available in volume, and offers a massive performance advantage over its predecessors.

The virtuous cycle of innovation in Artificial Intelligence will be built in the cloud, and not just in the USA, but especially in Asia, where Tencent-hosted NVIDIA GPUs stand to help accelerate that cycle globally, adding even more momentum to the AI flywheel. Now the rest of the world needs to catch up, or be left behind.

The post NVIDIA Scores Yet Another GPU Cloud For AI With Tencent appeared first on Moor Insights & Strategy.

]]>
Why Intel Is Buying Mobileye, And What Does It Need To Do To Be Successful? https://moorinsightsstrategy.com/why-intel-is-buying-mobileye-and-what-does-it-need-to-do-to-be-successful/ Fri, 17 Mar 2017 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/why-intel-is-buying-mobileye-and-what-does-it-need-to-do-to-be-successful/ In a move that Intel hopes can propel it to the forefront of The Next Big Thing, Intel announced it would purchase Mobileye, an Israeli company that makes sensors and cameras for driverless vehicles. Buying a leader to become a leader is never cheap: Intel will pay $13.3B for the firm, a 34% premium over […]

The post Why Intel Is Buying Mobileye, And What Does It Need To Do To Be Successful? appeared first on Moor Insights & Strategy.

]]>
In a move that Intel hopes can propel it to the forefront of The Next Big Thing, Intel announced it would purchase Mobileye, an Israeli company that makes sensors and cameras for driverless vehicles. Buying a leader to become a leader is never cheap: Intel will pay $13.3B for the firm, a 34% premium over Mobileye’s closing price last Friday. The two companies have a history of collaboration, having announced a partnership with BMW last year to put driverless vehicles on the road by 2021. But until this acquisition, it was hard to take Intel seriously in the automotive market; Intel only has a partial solution in-house. It takes specialized silicon to keep up with the massive data rates generated by a vehicle or a missile, and Mobileye brings that needed capability to Intel. But there are a few potholes along the road to autonomous vehicles Intel will need to steer around to make this expensive acquisition pay off.

Routing and steering a fast-moving vehicle seems to be an easy task to most of us; you just point the car in the right direction, and your eyes and brain just figure it out, sending instructions to your hands and feet. But in fact, humans are tragically bad at the task; the US National Safety Council estimates 38,300 people were killed and 4.4 million injured on U.S. roads in 2015 alone. Let’s face it, we need to be replaced with safer technologies and many companies from Google to General Motors and Intel to  intend on making that transition a reality. Yes, there are hurdles that need to be cleared, from the daunting computational task to the regulatory environment. But the economics are compelling and these barriers will be solved.

Why Did Intel Buy Mobileye, And Why Now?

The market for Advanced Driver Assisted Systems (ADAS) and fully autonomous (Level 5) vehicles is widely expected to explode during the decade. In fact, Intel says the market for vehicle systems and data services for autonomous driving will become a $70 billion opportunity by 2030. While the industry is in its infancy, Mobileye is already realizing good growth and revenue $358M in sales in 2016, a 49% increase over the previous year. Mobileye’s CTO and co-founder Amnon Shashua said in a conference call that the company is already working with 27 car manufacturers, including 10 production programs with Audi, BMW and others.

Meanwhile, a large portion of Intel’s revenues today come from powering PCs, networking and servers, and these older markets have become fairly stagnant, leading the company to lay off 12,000 employees last year. Having largely missed the smartphone market, the company now needs to find the next big growth engine, and this announcement shows it believes one of these will be self-driving vehicles, as well as other vision-guided systems such as robots, drones and other applications. And while $13B is not cheap for less than a half billion in revenues, one could argue that buying the leader in such a lucrative market will only get more expensive.

Moreover, Intel does not possess the requisite technologies to become a leader in this market by itself, and a partnering approach can be complex and slow. While Intel has CPUs that can act as the vehicle’s brains, it lacks the state of the art vision silicon and software. It also needs to deliver tightly integrated systems (CPU and accelerators) that can be difficult to design, negotiate and implement across company lines. Intel has already demonstrated this capability by integrating Altera and Xeon chips. Intel’s competitors in this space include NVIDIA, Qualcomm and Xilinx, all of whom are already delivering devices that integrate CPUs tightly coupled with vision with sensor fusion and machine learning. (See my Machine Learning Application Landscape for more information about this type of hybrid processor.) By acquiring Mobileye, Intel hopes that the combination of vision and brains will make it a leader in this fast growing market. And that certainly seems like a sound strategy.

So, What Does Intel Need to Do Now?

First, integrating two organizations, cultures and technologies headquartered half a world apart will be critical but will not be easy. Intel has experience here with integrating Altera, which should help. The combined team will need to create a technology and infrastructure roadmap, aligning and integrating hardware and software to compete in this market. Finally, it will need to build an integrated sales strategy to leverage Mobileye’s strong end-user relationships in the automotive industry.

Meanwhile, the combined company won’t have the market to itself and will face stiff competition from several sources. Strategically, I’d point out that this is the third acquisition, after Altera and Nervana, which are at least partly motivated by Intel’s former weak position in accelerators, especially with respect to NVIDIA. While Mobileye had first mover advantage in advanced vision chips and sensors, focusing on Lidar, some see NVIDIA’s GPUs and SOCs as superior solutions for multiple input modalities. In fact, Tesla, perhaps the leader in driverless car automation, has recently opted for the NVIDIA Drive PX 2 for future Tesla vehicles at the expense of Mobileye, its former supply partner. Mobileye technology is very good, but it delivers a custom platform, whereas NVIDIA’s Drive PX 2 and Jetson TX2 platforms are relatively open in the sense that OEM can program them for solutions tailored to their vehicles. NVIDIA’s platforms are general-purpose solutions that combine sensor fusion with CPUs and GPUs for machine learning. This allows the OEM to customize the platforms to meet their specific design requirements, while the Mobileye solution is a fixed vision-only part, not a programmable Machine Learning accelerator. Therefore it will be important to see whether Intel can leverage Mobileye’s technology and IP in other areas of machine learning inference. If it can, it would have a powerful duo: Nervana for ML training and Mobileye derivatives for inference. But here’s an important caveat: since Mobileye is currently a closed vision system, we don’t know if can deliver the performance for Machine Learning that NVIDIA has already demonstrated.

In addition to NVIDIA, Qualcomm’s purchase of NXP combined with its AI-enabled Snapdragon SOC, is clearly intended to enable it to compete in the automotive market. And Xilinx has recently announced the Xilinx reVISION platform which provides an SOC for vision-guided applications, combining ARM cores FPGAs with a rich software stack to simplify adoption. Xilinx FPGAs are no stranger to the automotive industry and already enjoy wide adoption across scores of models. While Intel theoretically could also compete in this market with its own Altera FPGAs, it would appear that Intel prefers to jump in with a proven leader instead of slowly building out its own FPGA-based solution. (Note that Intel’s Nervana Engine part, due out later this year, will probably target machine learning training in the datacenter, and will not be well suited for the low power automotive market.)

In conclusion, Intel’s acquisition can be seen as an expensive move to pole vault to become a Tier 1 provider of ADAS and fully autonomous vehicle computational solutions, a market that it can’t afford to miss. But it can also be seen as a yet another attempt to catch up in a market Intel was not well positioned to win. And a market expected to be as large as this one is already attracting a lot of healthy competition. Let’s hope they all get it right; we are counting on them to arrive safely at our destinations in the near future!

The post Why Intel Is Buying Mobileye, And What Does It Need To Do To Be Successful? appeared first on Moor Insights & Strategy.

]]>
A Machine Learning Landscape: Where AMD, Intel, NVIDIA, Qualcomm And Xilinx AI Engines Live https://moorinsightsstrategy.com/a-machine-learning-landscape-where-amd-intel-nvidia-qualcomm-and-xilinx-ai-engines-live/ Fri, 03 Mar 2017 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/a-machine-learning-landscape-where-amd-intel-nvidia-qualcomm-and-xilinx-ai-engines-live/ Without a doubt, 2016 was an amazing year for Machine Learning (ML) and Artificial Intelligence (AI) awareness in the press. But most people probably can’t name 3 applications for machine learning, other than self-driving cars and perhaps their voice activated assistant hiding in their phone. There’s also a lot of confusion about where the Artificial […]

The post A Machine Learning Landscape: Where AMD, Intel, NVIDIA, Qualcomm And Xilinx AI Engines Live appeared first on Moor Insights & Strategy.

]]>
Without a doubt, 2016 was an amazing year for Machine Learning (ML) and Artificial Intelligence (AI) awareness in the press. But most people probably can’t name 3 applications for machine learning, other than self-driving cars and perhaps their voice activated assistant hiding in their phone. There’s also a lot of confusion about where the Artificial Intelligence program actually exists. When you ask Siri to play a song or tell you what the weather will be like tomorrow, does “she” live in your phone or in the Apple cloud? And what about Amazon’s Alexa? Where does “she” live? (The answer to both questions is, “In the cloud”.) And while you ponder those obscure question, many investors and technology recommenders are trying to determine whether Advanced Micro Devices, Intel, NVIDIA, Qualcomm or Xilinx will provide the best underlying hardware chips, for which application and why. To help sort this out, this article provides a landscape for emerging AI applications, by industry and deployment location (cloud, edge or hybrids) and explores what type of hardware will likely be used in each.

The Landscape: By Industry and Deployment Location

The sheer volume of applications being built using Machine Learning is truly breathtaking, as evidenced by over 2,300 investors funding over 1,700 startups, according to data compiled by Angel List. The graphic below shows a Machine Learning application landscape, using broad categories of applications that may run on simple or specialized edge devices, on servers in in the cloud or a in a hybrid configuration using edge devices with tightly coupled cloud resources.

A Machine Learning application landscape (Source: Moor Insights & Strategy)

The Hardware: CPUs, GPUs, ASICs and FPGAs

As I have explored in previous articles, there are two aspects of Machine Learning: training the neural network with massive amounts of sample data and then using the trained network to infer some attribute about a new data sample. The job of training the network to “think” is typically performed in large datacenters on GPUs, almost exclusively provided by NVIDIA. Since that market domination appears to be pretty stable, at least for the time being (see my article about Intel’s acquired Nervana Technology for a potential challenger), I will focus here on the hardware used in inference, where the AI is actually deployed. The graphic below lays out the wide range of hardware targeting Machine Learning from leading vendors.

When it comes to Machine Learning, the fact is that there is not “One Chip to Rule Them All”. While all vendors claim their architecture (CPU, GPU, ASIC or FPGA) to be “the best” for AI and Machine Learning, the fact is, each has its advantages for a specific type of application, or data, that is being deployed and in a specific environment. The data complexity and velocity determines how much processing is needed, while the environment typically determines the latency demands and the power budget.

CPUs, like Intel’s Xeon and Xeon Phi in the datacenter and the Qualcomm Snapdragon in mobile devices, do a great job for relatively simple data like text and jpeg images, once the neural network is trained, but they may struggle to handle high velocity and resolution data coming from devices like 4K video cameras or radar. To help address this, Intel has pre-announced a new version of their multi-core Xeon Phi, code named Knights Mill, which is expected to be available later this year. However in many cases, the job may require a GPU, an ASIC like Intel’s expected Nervana Engine or perhaps an FPGA programmed to meet the demands of a low latency and low power environment such as a vehicle or an autonomous drone or missile. While the NVIDIA GPU will win most drag races for the fastest solution (throughput), the FPGA (typically from Intel or Xilinx) affords the ability to reconfigure the hardware as acceleration algorithms evolve as well as provide very low latencies. In the cloud, we see a similar situation, where GPUs, FPGAs and ASICs like Google’s TPU (Tensorflow Processing Unit) each represent unique capabilities and cost / benefit advantages for specific data types and throughput requirements.

Some applications such as vision-guided autonomous systems require a hybrid hardware approach to meet the latency and data processing requirements of the application environment. While the accelerators mentioned above do a great job of running the AI inference engine, sensor fusion, data pre-processing and post-scoring policy execution requires a lot of special I/O and fast traditional logic best suited for CPUs. To solve this challenge, NVIDIA offers hybrid hardware platforms with an ARM / GPU combo in NVIDIA’s Jetson and DrivePX2, while Intel and Xilinx offer SoCs that marry ARM and FPGAs into a single, elegant low-power package. All of these products are finding their way into drones, factory robots / cobots and automobiles where the right combination of speed, flexibility and low power demand innovative approaches.

Not to be outdone, Qualcomm has been busy beefing up their Snapdragon processor to include a variety of accelerator technologies to support Machine Learning in mobile and other edge devices that will comprise a smart Internet of Things (IoT). In fact, the most recent Snapdragon 835 includes a CPU with a GPU and a DSP (digital signal processor) to meet a variety of programming and hardware models being used to speed the Machine Learning algorithm.

As you can tell, one size does not fit all in meeting the computational needs of the emerging Machine Learning application landscape. The result will be more choices for engineering / design teams and tailored solutions for the intelligent systems, products and services that are being built. For more detailed analysis, please see the recently published paper by Moor Insights & Strategy on this topic.

The post A Machine Learning Landscape: Where AMD, Intel, NVIDIA, Qualcomm And Xilinx AI Engines Live appeared first on Moor Insights & Strategy.

]]>
What To Expect in 2017 From AMD, INTEL, NVIDIA, XILINX And Others For Machine Learning https://moorinsightsstrategy.com/what-to-expect-in-2017-from-amd-intel-nvidia-xilinx-and-others-for-machine-learning/ Fri, 06 Jan 2017 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/what-to-expect-in-2017-from-amd-intel-nvidia-xilinx-and-others-for-machine-learning/ Without a doubt, 2016 was an amazing year for Machine Learning (ML) and Artificial Intelligence (AI). I have opined on the 5 things to watch in AI for 2017 in another article, however the potential dynamics during 2017 in processor and accelerator semiconductors that enable this market warrant further examination. It is interesting to note […]

The post What To Expect in 2017 From AMD, INTEL, NVIDIA, XILINX And Others For Machine Learning appeared first on Moor Insights & Strategy.

]]>
Without a doubt, 2016 was an amazing year for Machine Learning (ML) and Artificial Intelligence (AI). I have opined on the 5 things to watch in AI for 2017 in another article, however the potential dynamics during 2017 in processor and accelerator semiconductors that enable this market warrant further examination. It is interesting to note that shares of NVIDIA roughly tripled in 2016 due in large part to the company’s technology leadership in this space. While NVIDIA GPUs currently enjoy a dominant position for Machine Learning training today, the company’s latest quarter growth of 197% YoY, in a market now worth over a half billion dollars, has inevitably attracted a crowd of potential competitors, large and small. And semiconductors remain one of the few pure AI plays for public equity investors seeking a position in this fast growing market.

A Machine Learning Chips Primer

First, let’s look at some background on the computational landscape for Machine Learning (ML). As most of you probably know, there are two aspects of Machine Learning: training the neural network with massive amounts of sample data and then using the trained network to infer some attribute about a new data sample. Both are incredibly computationally intensive, but the training task is mindboggling complex; training a deep neural network (DNN) for image, text or voice classification work literally requires trillions of billions of calculations to achieve adequate accuracy, typically around 95%. I have included a primer on this topic as an appendix to this article.

And the predictions are…

NVIDIA in 2017

Let’s start with the leader, NVIDIA. In 2017, I expect NVIDIA to ramp Pascal (Tesla P100, P4 and P40) volumes, continue to nurture new market development with their software stack and then launch their next generation Volta GPUs (very) late in the year, probably at SC’17 in Denver. I hope to hear a few more details at the annual GTC conference in April. These powerful new NVIDIA chips, with faster GDDR6 and HBM2 memory, will be connected with IBM POWER9 CPUs using NVLINK and / or OpenCAPI, with interconnects from Mellanox, for 2 large supercomputers for the DOE’s Summit and Sierra procurements. Don’t be surprised if this launch is Deep Learning centric, as these supercomputers will offer unprecedented Machine Learning performance. Based on the NVIDIA roadmap (below), these new GPUs could deliver up to twice the performance of PASCAL, or from 21 TFLOPS ½ precision to perhaps 35-40 TFLOPS.

NVIDIA-Pascal-GPU_GTC_Performance-Per-Watt-1200x675

NVIDIA’s GPU roadmap would imply that the Volta GPU could be twice the performance per watt compared to PASCAL. (Source: NVIDIA)

AMD in 2017

Advanced Micro Devices (AMD) will begin shipping the highly anticipated Zen processor for servers in 2Q, packaging up to 32 cores in the Naples SOC. For Machine Learning, the company has already disclosed a few details on their next generation Vega GPU, which will also ship some time mid 2017. AMD has implied that this chip will deliver 25 TFLOPS in ½ precision, which would give AMD a slight 20% edge over the NVIDIA Pascal P100. However, note that this advantage could be short lived if NVIDIA’s Volta keeps to its schedule.

While AMD has been developing this monster chip, they have completely revamped their software stack for GPU in servers through their Radeon Open Compute Platform (ROCm), an open source alternative to NVIDIA’s CUDA and CuDNN, so they will be ready to ramp their ecosystem as soon as the new silicon is ready for action. They have also announced GPUs for Machine Learning inference under the brand Radeon Instinct brand. However I do not believe these will be competitive with NVIDIA P4 and P40, since the AMD GPUs are based on Fiji and Polaris architectures, which do not support native 8-bit integer math operations (just packed 8-bit integer operands). So I would expect the company to shore up their offerings for inference GPUs sometime in 2017.

Intel in 2017

I expect Intel to start shipping their newly acquired Nervana Engine in the 2nd half of 2017 for ML training workloads. At Intel’s AI Day in November, the company said this product will be 100 times faster than the “best GPU”, which may refer to Volta but probably refers to Pascal P100. I remain skeptical that they can achieve such a feat, but it will be exciting to watch. Certainly, there remains a degree of execution risk at this stage of product development and acquisition integration. But the idea that a purpose built accelerator could perform significantly better than a General Purpose GPU has merit, since a GPU still has die area dedicated to functions that are only used for graphics, as well as features such as double precision floating point not needed by Machine Learning. It will be interesting to see if and how Intel plans to exploit the Nervana fabric, which allows for a high degree of scaling for ML workloads.

In addition, Intel has previously stated that they will provide a mid-life kicker for the Xeon Phi (KNL) CPU, adding variable precision math (8-bit integer and perhaps 16-bit floating point) to improve their ability to compete with GPUs. For neural networks that demand a lot of memory, this could significantly improve Intel’s standing, especially for the inference side of the AI workload.

IDF16-Phi-Knights-Mill-slide-850x

Diane Bryant announced the mid-life kicker for Knights Landing, code named Knights Mill, at IDF’16 to address the needs of Machine Learning in 2017. (Source: Intel)

XILINX in 2017

Xilinx announced their “reconfigurable acceleration platform” for Machine Learning last November and has a refreshed product portfolio including 16nm and 20nm technologies, well ahead of their Intel / Altera competition. The Xilinx strategy is to ease the development hurdles of FPGA programming by providing a suite of tools, APIs and IP already tailored to accelerate specific datacenter workloads. Amazon recently selected Xilinx to provide FPGA acceleration in the cloud for a wide variety of workloads, so I would expect a steady stream of new platforms that will enable Xilinx to compete in ML, primarily in inference jobs in datacenters and at the edge of the cloud.

Here come the ASICs in 2017

While CPUs, GPUs and FPGAs get all the headlines and the vast bulk of the revenue in Machine Learning, several companies have been able to attract cash from the venture capital community to fund the development of chips that are designed to do just one thing and do it very well: fast Machine Learning. Even though a GPU farm can now train a neural network in a matter of days instead of weeks or months, researchers yearn for chips that could train a network in minutes or hours to speed and improve on the development process. Google themselves developed such a chip to accelerate ML inference and announced the product in 2016 as the Google TensorFlow Processing Unit (TPU).

Now several startups, in addition to Intel / Nervana mentioned above, could potentially launch ASICs for Machine Learning in 2017 and 2018. The companies to watch include Cerebras, KnuPath, GraphCore and Wave Computing. While all but Wave remain in stealth mode, and so details are not yet available, all of these companies hope to accelerate machine learning by a factor of over 100x versus a GPU. I suspect all will target both inference and training, but we should know more by the end of 2017. And it appears that many are using some form of dataflow architectures, an unproven approach that has been the subject of academic research for well over a decade, but which holds great promise for Machine Learning if they can be made to work. The first company out of the gates, pun intended, is probably Wave Computing, which has already publicly disclosed their architecture last October, and plans to sample their appliances some time this year. (Several of these companies hope to provide turnkey ML appliances, instead of engaging in the longer slog of becoming a merchant semiconductor provider.) Also I expect Google to update their TPU device sometime in late 2017 or early 2018, at which time I would not be surprised to see them tackle the training side of the computational spectrum. So, 2017 could possibly become the year of the ML ASIC, although the risks and challenges are quite high.

So, as you can see, 2017 will be a year of increased competition to supply the silicon brains behind the artificial brains of Machine Learning. One thing to keep in mind beyond the chips, is that the ecosystem that NVIDIA has developed and nurtured will remain a significant advantage for the company for years to come, and thus represents a hurdle for all newcomers who envy NVIDIA’s success. But this is a barrier that everyone seems to acknowledge, and intends to clear, as the rewards appear to be worth the investment.

A Machine Learning Chips Primer: CPUs, GPUs, FPGAs and ASICs

Today, the task of training a neural network, which is the basis of most of the recent advancements in AI, is the undisputed domain of the GPU. NVIDIA has incorporated reduced precision floating point (16 bits, or “half-floats”) instead of the standard 32 bit operations to accelerate this task even further; after all, if you can solve the problem with half the precision, you can theoretically apply twice the number of arithmetic units (ALUs) in the same die space (and therefore the same power and cost). Memory bandwidth is another key requirement of training accelerators since that becomes the next bottleneck once you can deliver the increased computational throughput. The NVIDIA Pascal architecture excels in both of these areas, raising the bar for competitors.

The task of inference is not as simple to characterize, as the nature of the data being analyzed will determine the type of architecture that is best suited for the job. If you are just analyzing text, and need an answer in, say, less than 50 milliseconds online, a modest CPU can be adequate, from an ARM CPU in a phone to a Xeon server in the cloud. But if you are analyzing 4K video data at 30 frames per second, and need to know whether to apply the brakes on the car you are “driving”, then you will need something much faster; perhaps a GPU, an FPGA, or even a custom chip (an application-specific integrated circuit, or ASIC). The vast bulk of inference today is pretty simple and can even be calculated with 8-bit integer math; once again, a reduced precision approach that can increase performance by perhaps 4X in this case. And, once again, NVIDIA took the lead; the NVIDIA PASCAL architecture supports both native 8-bit and 16-bit math in the M4 and M40 GPUs used in inference.

However, Xilinx’s latest FPGAs now support 8-bit integer operations for Machine Learning and have the added benefit of being reconfigurable, changing the hardware as algorithms evolve. This little bit-nit will play an important role as new products are launched in 2017. For example, Intel will add 8-bit integer operations to the Xeon Phi chip in 2017. Finally, if you need to support a great many instances with simultaneous threads of inference, as you might in a large cloud environment, the high development cost of a custom chip may be worth the investment. Google has done exactly this with the Google TensorFlow Processing Unit (TPU), enabling them recently to reduce their prices for certain machine learning services by a factor of 8. While it may cost 10s of millions of dollars to develop an ASIC of this complexity, the manufacturing costs can be reduced to tens of dollars per chip.

The post What To Expect in 2017 From AMD, INTEL, NVIDIA, XILINX And Others For Machine Learning appeared first on Moor Insights & Strategy.

]]>
Could AMD’s New Software Be A Game Changer In The Datacenter? https://moorinsightsstrategy.com/could-amds-new-software-be-a-game-changer-in-the-datacenter/ Mon, 14 Nov 2016 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/could-amds-new-software-be-a-game-changer-in-the-datacenter/ Advanced Micro Devices (AMD) Radeon GPUs have been gaining popularity in gaming and virtual reality, thanks to the performance afforded by their use of 14nm FinFET manufacturing technology and High Bandwidth Memory, coupled with aggressive pricing. However, the company’s High Performance Computing (HPC) products, previously branded as AMD FirePro, have not gained much traction in spite […]

The post Could AMD’s New Software Be A Game Changer In The Datacenter? appeared first on Moor Insights & Strategy.

]]>
Advanced Micro Devices (AMD) Radeon GPUs have been gaining popularity in gaming and virtual reality, thanks to the performance afforded by their use of 14nm FinFET manufacturing technology and High Bandwidth Memory, coupled with aggressive pricing. However, the company’s High Performance Computing (HPC) products, previously branded as AMD FirePro, have not gained much traction in spite of their double precision floating point performance and support for lots of GPU memory. This is in large part due to the fact that AMD’s GPU software (drivers, libraries, etc.) was designed for a Windows environment for workstations and gaming, not for Linux and the server applications that dominate the datacenter acceleration market. But this may be about to change.

Recently AMD began to reposition their newer workstation and presumably future server GPU’s as “Radeon Pro”, and has launched a completely revamped Linux software stack to enable AMD to compete with NVIDIA in the fast-growing datacenter markets for HPC and Deep Learning. Note that NVIDIA recently generated $151M in these segments during their latest quarter, over twice the level of a year ago. The new AMD software could pave the way for AMD to gain a foothold in this fast-growing market, although the company must address significant challenges to realize this potential.

What has AMD announced?

AMD is delivering a new software stack called the Radeon Open Compute Platform, or “ROCm”, to address the HPC and Deep Learning market. Prior to this software release, the prospective AMD customer typically had to port their code to OpenCL, or use C++ with an entirely different approach for parallelizing their application, to run on an AMD GPU. Now, the programmer simply runs his/her CUDA code through an AMD “HIPify” tool to create the Heterogeneous-compute Interface for Portability (HIP) source code, which can then be compiled using AMD’s new HCC (High-Performance Compute Compiler) or NVIDIA’s NVCC compiler. The AMD code would then execute on a brand new Linux driver, called ROCk, which supports a handful of AMD’s more recent GPUs. This is a smart approach, as it does not place undue burden on the programmer, and allows him/her to continue to maintain a single source code for both AMD and NVIDIA execution. In addition, AMD is providing a slew of libraries, applications, benchmarks, tools and HSA runtime extensions to ease the transition to their hardware for HPC and Deep Learning.

How Well Does ROCm Perform?

AMD has provided data to demonstrate the performance of the new ROCk driver, compared to the prior Catalyst (Windows) driver. As you can see, the new driver delivers dramatically lower latencies to dispatch compute kernels, a key metric for running GPU parallelized codes. AMD also has provided data showing that the use of the HIP abstraction layer for CUDA codes does not significantly impact performance on either AMD or NVIDIA hardware.

amd-rocm

AMD says ROCm’s new open source driver dramatically reduces latencies compared to the prior Catalyst driver, which was optimized for Windows, not for computational Linux workloads. (Source: AMD)

Will this help AMD get (back) into the HPC game?

Well, let’s say this software is necessary but insufficient. ROCm provides the needed software for datacenter applications, but AMD’s hardware in this space is now lagging behind NVIDIA. Specifically, while the Firepro S series of GPUs compared favorably to NVIDIA Kepler GPUs in double precision performance and memory size when it was introduced in 2014, those older NVIDIA GPUs are being replaced with the new, more powerful PASCAL generation, which also offer High Bandwidth memory at the high end of the product line (P100 with NVLink). And the newer Radeon Pro products, based on Polaris chips, are designed for professional graphics at the time of this writing, not servers. If AMD introduces a Radeon Pro for servers based on Polaris, it would have solid single precisions performance but would likely not provide double precision math and ECC memory needed for many scientific HPC applications. But that might be ok if AMD wants to focus on codes such as seismic analysis or Machine Learning. For the latter fast-growing segment, AMD would need to augment their current GPUs with devices that natively support ½ precision floating point and 8-bit integer arithmetic to compete with NVIDIA PASCAL. So, while AMD has taken two steps forward with their software, they’ve taken one step back for HPC GPUs from a competitive perspective. We look forward to hearing more about their roadmap beyond Polaris, especially as it relates to support for the specific math operations mentioned above. Even then, the company would need to deploy customer-facing experts and invest in the ecosystem, where NVIDIA already has a strong position.

For more detailed information regarding ROCm, please see the recently published Moor Insights & Strategy research paper on this topic here.

The post Could AMD’s New Software Be A Game Changer In The Datacenter? appeared first on Moor Insights & Strategy.

]]>
Xilinx Seeks To Mainstream FPGAs In The Datacenter https://moorinsightsstrategy.com/xilinx-seeks-to-mainstream-fpgas-in-the-datacenter/ Mon, 14 Nov 2016 06:00:00 +0000 https://staging3.moorinsightsstrategy.com/xilinx-seeks-to-mainstream-fpgas-in-the-datacenter/ Why are so many companies suddenly jumping into the datacenter accelerator game? Major chip companies such as Intel, NVIDIA and Xilinx as well as startups such as Nervana (being acquired by Intel), Wave Computing, GraphCore, KnuPath and others are all vying for a piece of a rapidly growing market. That market consists primarily of just seven customers, […]

The post Xilinx Seeks To Mainstream FPGAs In The Datacenter appeared first on Moor Insights & Strategy.

]]>
Why are so many companies suddenly jumping into the datacenter accelerator game? Major chip companies such as Intel, NVIDIA and Xilinx as well as startups such as Nervana (being acquired by Intel), Wave Computing, GraphCore, KnuPath and others are all vying for a piece of a rapidly growing market. That market consists primarily of just seven customers, the world’s largest datacenters: Alibaba, Amazon.com, Baidu, Facebook, Google, Microsoft and Tencent. These companies are increasingly turning to technologies that can run specific algorithms at least 10 times faster in order to meet the demand for applications such as machine learning, ultra-high-definition video streaming and complex data analytics. While GPUs (Graphics Processing Units) from NVIDIA have been leading much of this trend, Field Programmable Gate Arrays (FPGAs) hope to now contend to become a major player. (Recall that Intel invested in this market through their $16.7B acquisition of Altera last year.) Now Xilinx is aiming to take this market mainstream with a new offering that speeds development of these reprogrammable acceleration chips.

FPGAs are used today for processing of automotive sensor data, network accelerators, embedded industrial applications and other tasks where high performance and energy efficiency are required but where the volumes do not cost-justify developing a custom Application-Specific Integrate Circuit (ASIC) chip. Microsoft recently discussed that they use FPGAs in practically every server in their massive datacenters. So why doesn’t everyone use these esoteric chips in their datacenter? Unfortunately, as it turns out, FPGAs are notoriously difficult to program, requiring hardware (that’s the “Gate” part in FPGA) as well as software expertise–a rare combination of talents. This is in fact the challenge that Xilinx is hoping to address by delivering common building blocks and tools for 3 key hyperscale workloads that demand more performance and that are still in the relatively early stages of algorithm development.

What has Xilinx announced?

Xilinx believes that the trend described above is in the early stages and, by lowering the programming hurdles and easing development of FPGAs, that these reprogrammable hardware devices will go “mainstream” in hyperscale datacenters. So, Xilinx has pulled together a suite of software, tools and hardware reference designs to accelerate, well, acceleration. Xilinx is targeting three of the fastest growing workloads that demand more performance today: machine learning, video transcoding for live 4K streaming and SQL queries for data analytics.

xilinx-stack-graphic-1200x632

The Xilinx Reconfigurable Acceleration Stack is targeting the markets for machine learning, video transcoding and SQL queries for data analytics. (Source: Xilinx.)

Why machine learning, video & SQL acceleration?

Xiinx is focusing on the workloads that have large and fast-growing footprints in hyperscale datacenters. Machine Learning has been all over the news lately as AI becomes the new big thing, while SQL analytics have become pervasive across hyperscale applications. And live High-Dev (4K) video streaming of sporting events and gaming competition is becoming big business around the globe. All three of these workloads demand far more performance than a CPU-only infrastructure can deliver, and are well suited for FPGA acceleration. Xilinx has shared benchmarks that demonstrate acceleration from 4 to over 25 times the performance of a CPU-only server across the spectrum of these applications. Perhaps most importantly, these three workloads represent significant revenue opportunities, not just cost savings in IT, for the Super Seven companies.

Will this drive FPGAs into the mainstream?

FPGAs have long held promise for a wide swath of workload acceleration in large datacenters but have been held back by the fear, real or imagined, of the programming difficulty and lack of available skills. It is too early to tell if the building blocks Xilinx has launched will close this gap, but the timing of the launch on the heels of Microsoft and Baidu’s recent announcements could not have been better. So, I would turn the question around, and state that if FPGAs are going to go mainstream, then it will take something like these acceleration stacks to break down the barriers to volume deployment. And if Microsoft is any indication, we are likely to see more, not less, adoption of FPGAs in the world’s largest datacenters.

For more detailed information regarding Xilinx’s Reconfigurable Acceleration Stack, including benchmarks, please see the recently published Moor Insights & Strategy research paper on this topic here.

The post Xilinx Seeks To Mainstream FPGAs In The Datacenter appeared first on Moor Insights & Strategy.

]]>
Microsoft’s Love For FPGA Accelerators May Be Contagious https://moorinsightsstrategy.com/microsofts-love-for-fpga-accelerators-may-be-contagious/ Wed, 05 Oct 2016 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/microsofts-love-for-fpga-accelerators-may-be-contagious/ Microsoft has announced more details about their use of Field Programmable Gate Arrays (FPGAs) to accelerate servers in their massive datacenters. CEO Satya Nadella made the announcements at their Ignite Conference in Atlanta (which MI&S colleague Patrick Moorhead attended), sharing details about their five-year journey called “Project Catapult.” The surprise was not that they are […]

The post Microsoft’s Love For FPGA Accelerators May Be Contagious appeared first on Moor Insights & Strategy.

]]>
Microsoft has announced more details about their use of Field Programmable Gate Arrays (FPGAs) to accelerate servers in their massive datacenters. CEO Satya Nadella made the announcements at their Ignite Conference in Atlanta (which MI&S colleague Patrick Moorhead attended), sharing details about their five-year journey called “Project Catapult.” The surprise was not that they are using FPGAs; Microsoft had disclosed their adoption of FPGAs to accelerate BING search ranking over three years ago. What surprised many industry observers was the extent to which they are already deploying this typically esoteric style of chip and their plans for pervasive use of the technology in the future. Mr. Nadella said that the entire fleet of servers for the Azure cloud now has at least one FPGA installed in each server, delivering over one “exa op” (one billion billion operations per second) of total throughput across datacenters in 15 countries. An “ExaScale” computer in traditional HPC sense (meaning double precision math) is not expected to appear until early in the next decade.

nadella

Microsoft CEO Satya Nadella at the Ignite Conference (Source: Microsoft)

If Microsoft’s penchant for FPGA acceleration spreads to others in the exclusive club of the Super Seven largest datacenter operators (AlibabaAmazon.com, Baidu, Facebook, GoogleMicrosoft and TenCent), the impact on the chip industry, notably Intel (Altera), NVIDIA and Xilinx, could be substantial. Certainly, Intel’s $16.7B acquisition of FPGA leader Altera now appears to have been prescient; at the time of the acquisition they predicted that 30% of servers would require FPGA acceleration by 2020. However, aficionados of this technology have cried wolf many times before in predicting that this difficult-to-program technology is about to cross the chasm and become a mainstream force in the industry.

What’s an FPGA and why does Microsoft care?

The real key to Microsoft’s heart is not just performance or power consumption. Microsoft points to the flexibility that FPGAs afford due to their inherent programmability. FPGAs, like GPUs, can be used to accelerate specific codes that lend themselves well to being executed in parallel, a critical and common feature of applications such as Deep (Machine) Learning. But the “P” in FPGA means programmable, and therein may lay their most important value to Microsoft and in the datacenter in general. Once programmed, the FPGA hardware itself can be changed (reprogrammed) in the field (hence the “F”) to enable it to evolve with changes in the company’s business, science and underlying logic. Microsoft says that they update the programming frequently, as often as every month.

As a result, Microsoft now sees FPGAs as an essential extension to nearly every server, accelerating a wide variety of demanding workloads in a world dominated by voice and image data. While many datacenters are now beginning to use GPUs to accelerate Machine Learning and other applications, helping to drive 110% growth in NVIDIA’s datacenter segment last quarter, the use of FPGAs have been predominantly confined to developing other chips and accelerating niche workloads such as networking, deep packet inspection, video transcoding, image processing and data compression. But these once-rare workloads are rapidly becoming mainstream as we all begin using voice and images instead of keyboards and mice. And the use of neural networks using these data types is exploding.

Note that using FPGAs is not without its challenges. Namely, the difficult task of programming these chips is often likened to rocket science, mastered by very few people who posses both hardware and software skills. But Microsoft says this investment can be justified by the impressive performance gains and the ability to adapt to changing business needs.

However, when a specific use case requires a very large number (on the order of a million) of these specialized chips to be deployed, developers typically burn the FPGA logic into an Application-Specific Integrated Circuit, or ASIC, turning the programmable chip into an even faster and lower cost piece of fixed-function hardware. Note that this process can cost many tens of millions of dollars and take months or even years to perfect and produce. Google’s use of the Tensor Processor Unit (TPU) for the same Deep Learning inference job is a prime example. As a result, FPGAs have tended to remain a niche technology used by the brave and for the small (in terms of chip unit volume) workloads. At least that has been the case until now.

So, what’s next?

Just because Microsoft can afford to hire an army of expensive computer scientists to program FPGAs doesn’t mean that FPGAs will take over the world. As I mentioned, when a chip is needed in sufficient volume, the cost of developing an ASIC can be justified, producing a much lower cost platform which can then be programmed in a higher level language. Intel’s recent acquisition of Nervana is another case in point. And of course, GPUs made by Advanced Micro Devices (AMD) and NVIDIA are excellent examples of this. Since these ASICs are “hardened” into silicon, they can be more affordable and even more efficient than an FPGA.

By being open about their innovative use of FPGAs, Microsoft undoubtedly hopes to broaden the appeal of FPGAs and increase the pool of talented engineers as well as the optimized libraries and software for the adoption of FPGAs in the datacenter. And Microsoft’s plans for FPGAs extend far and wide: beyond Deep Learning acceleration, Microsoft is using FPGAs to accelerate networking and the complex software required to implement software-defined networks.

Meanwhile, Intel Altera and Xilinx, the largest FPGA suppliers, will be cheering them on. For its part, Intel has married the cost and performance benefits of fixed hardware (Broadwell CPUs) with the flexibility of Altera FPGAs in a combined hybrid package. Xilinx on the other hand, offers the flexibility of working with all CPU architectures (ARM, POWER and possibly AMD in the future) and interconnect technologies and has been active in partnering with IBM OpenPower and the OpenCAPI effort. This story will undoubtedly continue to evolve as more use cases and software becomes widely deployed in this new world of heterogeneous computing platforms.

The post Microsoft’s Love For FPGA Accelerators May Be Contagious appeared first on Moor Insights & Strategy.

]]>
Intel And NVIDIA Vie For Attention For ‘The Next Big Thing’ https://moorinsightsstrategy.com/intel-and-nvidia-vie-for-attention-for-the-next-big-thing/ Tue, 13 Sep 2016 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/intel-and-nvidia-vie-for-attention-for-the-next-big-thing/ In a battle that has become somewhat predictable, but fun to watch, NVIDIA and Intel have recently announced new technologies and acquisitions, respectively, to compete in the fast growing market for Deep Learning in Artificial Intelligence (AI). These announcements demonstrate that both companies are doubling down on their strategies: NVIDIA intends to win with a portfolio of […]

The post Intel And NVIDIA Vie For Attention For ‘The Next Big Thing’ appeared first on Moor Insights & Strategy.

]]>
In a battle that has become somewhat predictable, but fun to watch, NVIDIA and Intel have recently announced new technologies and acquisitions, respectively, to compete in the fast growing market for Deep Learning in Artificial Intelligence (AI). These announcements demonstrate that both companies are doubling down on their strategies: NVIDIA intends to win with a portfolio of hardware and software based on a common GPU architecture, while Intel intends to compete with CPUs (Xeon and Xeon Phi) beefed up with application-specific integrated circuits (ASICs).

Over the last few months, practically every major technology CEO has declared that Artificial Intelligence will be “The Next Big Thing”. The technologies that will power everything from “precision agriculture” to self-driving vehicles all have one thing in common: they demand an outrageous amount of compute power (literally billions of trillions of operations) to “train” the neural networks that create them. So it is not surprising that NVIDIA and Intel have taken the gloves off in the battle for this lucrative and fast growing market. To illustrate that growth, NVIDIA recently announced that their Datacenter business grew at an eye-popping 110% Y-o-Y in the latest quarter to $151M, accounting for roughly 10% of NVIDIA’s total revenues.

Intel brings out some big guns

Clearly, after Intel missed the transition to mobile, or “The Last Big Thing”, they do not intend miss out on “The Next Big Thing”. Intel entered the fray just last June when they launched the Knights Landing many-core Xeon Phi, where they tried to pin NVIDIA to the mat with impressive-sounding benchmarks for AI workloads. However, NVIDIA subsequently responded with a litany of corrections, claiming that NVIDIA, not Intel, wins these contests if the benchmarks are properly configured.

Then, in what appeared to some as an abrupt about-face, Intel announced that they are acquiring Nervana Systems, an AI startup that is developing an ASIC that accelerates the training of neural networks. While still in development, the company claims that the planned Nervana Engine accelerator will outperform a GPU (read NVIDIA) by 10x. Nervana also brings some impressive software to the table that delivers a 3x performance boost over the equivalent NVIDIA software. It remains to be seen how Intel will integrate this technology into their business, but the potential for disruption is certainly there.

kf1

Intel’s Diane Bryant introduces Slater Victoroff, CEO of Indico. Sporting some well-worn sandals, Mr. Victoroff delighted Ms. Bryant by noting that his Deep Learning applications are not well suited to GPUs, and explained why his company prefers Intel Xeon Phi. (Source: Intel)

The following week, Diane Bryant’s IDF keynote featured two AI pioneers on stage to tout the benefits of using Xeon Phi for AI. First up was startup CEO Slater Victoroff of Indico, who specializes in making text and image analysis easier for enterprise applications. The second AI speaker was Jing Wang, senior vice president of engineering at Baidu, a leader in voice processing and natural language translation technology. Wang was enthusiastic about using Xeon Phi for Deep Learning in his shop, which is a really big deal since Baidu also works very closely with NVIDIA.

Ms. Bryant then announced that the next generation Xeon Phi, called Knights Mill, would target AI and would support variable precision math, a key feature that enables NVIDIA’s new Pascal chips to essentially double their AI performance for free.

Finally, Intel announced last week that they will acquire Movidius, a Silicon Valley company that has been delivering computer vision silicon accelerators for many years. Movidius is well positioned in the vision market, such as in drones, and could help Intel further their ambitions in the automated driving market and in the Internet of Things (IoT). So, in the span of just 3 months, Intel has gone from zero AI products to an impressive portfolio that can accelerate both training and vision inference applications.

But this is still NVIDIA’s house

Not to be outdone, NVIDIA has continued to roll out new products based on their Pascal generation 14nm architecture, which now includes 5 data center products for Deep Learning: the Pascal P100 with NVLink and 2 Tesla P100 PCI-e Cards targeting Deep Neural Network training, and the newly announced Tesla P4 and P40 PCI-e accelerators for cost-effective AI inference, especially for cloud applications, where the trained network is used to make decisions. These newest chips support 8-bit integer math that delivers an astounding 22 and 47 trillion operations per second, respectively, and are optimized with the new software NVIDIA announced for inference (TensorRT) and real-time video analytics (DeepStream). Analyzing streaming video to identify content attributes is an example of a computationally demanding inference job that is way beyond what an Intel Xeon can tackle in real time. This is where the new Teslas shine: the company claims that a single Tesla P4 card running DeepStream can perform as well as 15 Intel Xeon E5 dual socket servers and should be popular with providers of public cloud infrastructure.

kf2

NVIDIA’s new Tesla P4 accelerator targets the inference side of Deep Learning, where the volume in cloud computing infrastructure is likely to be large. (Source: NVIDIA)

In spite of Intel’s attempts to steal the limelight, NVIDIA is riding high on the momentum they have built in AI, and the future looks bright. Recent research published by Narrative Science suggests that 62% of enterprises, which have been slow to adopt AI to date, plan to deploy AI applications in their businesses by 2018. If this adoption rate takes hold, and I believe it can, we have barely begun to see how these new technologies will transform businesses and the world around us, and both Intel and NVIDIA are positioning themselves to take advantage.

The post Intel And NVIDIA Vie For Attention For ‘The Next Big Thing’ appeared first on Moor Insights & Strategy.

]]>
Intel Acquires Nervana Systems Which Could Significantly Enhance Future Machine Learning Capabilities https://moorinsightsstrategy.com/intel-acquires-nervana-systems-which-could-significantly-enhance-future-machine-learning-capabilities/ Tue, 09 Aug 2016 05:00:00 +0000 https://staging3.moorinsightsstrategy.com/intel-acquires-nervana-systems-which-could-significantly-enhance-future-machine-learning-capabilities/ Intel has announced that it will acquire Nervana Systems, a Deep Learning startup based in San Diego and Silicon Valley, to extend their capabilities in the fast-moving market for training deep neural networks used in artificial intelligence (AI) applications. I recently wrote about Nervana here. Training neural networks is a hot market where companies typically use GPUs […]

The post Intel Acquires Nervana Systems Which Could Significantly Enhance Future Machine Learning Capabilities appeared first on Moor Insights & Strategy.

]]>
Intel has announced that it will acquire Nervana Systems, a Deep Learning startup based in San Diego and Silicon Valley, to extend their capabilities in the fast-moving market for training deep neural networks used in artificial intelligence (AI) applications. I recently wrote about Nervana here. Training neural networks is a hot market where companies typically use GPUs to teach a machine how to process text, image, voice and other data types. Nervana is developing an accelerator and software that is tailored to this task instead of using a more general purpose GPU to do the heavy lifting. This acquisition provides Intel with a specific product and IP for Deep Learning, which can be used in standalone accelerators and can be integrated with future Intel technology to deliver more competitive and innovative products.

Why Does Intel Need Yet Another Architecture?

A GPU does a great job with machine learning, because it has thousands of floating point units that can be used in parallel for the matrix (tensor) operations that make up the bulk of the processing in training a deep neural network (DNN). But most GPUs have a lot of other capabilities as well, tailored for processing graphic images and producing graphics output. In addition, GPUs provide higher precision floating point used by High Performance Computing (HPC) applications like financial analysis, simulation and modeling, which is not required for Deep Learning algorithms. All this functionality takes up valuable space and power on the GPU chip. In theory, therefore, the Nervana approach could deliver higher performance and / or lower costs for these computationally intensive workloads, however the company has not yet provided any performance projections for their chip.

Nervana has also not disclosed many details about their processor, focusing for now on their NEON software for accelerating GPUs in the Nervana Cloud as they work to finish their chip for a 2017 debut. But they have previously shared that the Nervana Engine will include an on-die fabric switch that interconnects these devices in a 3D torus topology. This feature will enable the engines to scale to a large number of cooperating accelerators, a capability needed to train more complex DNNs, such as convolutional and recurrent neural networks. Exploiting this functionality will require additional engineering by system vendors or Intel, so it may take some time to materialize. We will know more sometime next year about how well their chip performs, and how well it will support popular AI frameworks such as Caffe, Torch and Tensorflow.

Does Intel Still Need a Big GPU?

When it comes to processors, Intel has one or more of every architecture flavor except big GPUs. They have desktop processors with integrated (little) GPUs, Xeon CPUs for servers, many-core Xeon Phi (“Knights Landing”) for HPC and supercomputers, and Altera FPGAs for specific function accelerators including inference engines for Deep Learning. But I am often asked if Intel still needs a heavy duty GPU. With this acquisition, I think the answer now is “no”; they can cover the much of the GPU acceleration spaces between Xeon Phi, Altera FPGAs, and now the Nervana Engine IP for AI. And Intel’s recent push into autonomous driving systems could benefit from a low-power DNN engine like Nervana appears to be developing.

What Will Intel Do With Nervana’s Technology?

Since the Nervana team is building a standalone accelerator today, Intel will likely continue down that path, at least for the initial release. But Intel excels at integrating technology, be it on chip or on multi-chip packages. Adding the Nervana Engine IP to a Xeon CPU could deliver a low cost approach to onboard acceleration, but then scaling that would not be straightforward, as the CPU-accelerator ratio would be fixed at 1-1. Therefore I think Intel may eventually productize the Nervana IP in several form factors, perhaps standalone products for strong scaling used in training, and one or more integrated solutions for using those trained neural networks for inference workloads.

In any event, it appears Intel has now closed one of the few gaps in their datacenter product line and stands to better participate in the incredible growth the AI market. I must note that there remains a lot of work to do, and NVIDIA certainly isn’t standing still. NVIDIA sets a high bar by which all contenders will be measured, and has nurtured a rich ecosystem of software and research institutions around the world that will take significant time and resources to replicate.

The post Intel Acquires Nervana Systems Which Could Significantly Enhance Future Machine Learning Capabilities appeared first on Moor Insights & Strategy.

]]>