It makes sense that established chipmakers like Intel are trying to carve out a slice of the AI chip market, which is anticipated to be worth over $90 billion by 2025.
While Intel’s engineers have been working to overcome the challenges associated with using Deep Learning methods—namely by modifying the chipmaker’s current Xeon server processor lineup—in order to properly compete with other players, they also found that they needed to develop dedicated AI accelerator solutions.
Machine learning (ML) essentially works in two steps: training and inference. Both of these points require different computational approaches and, ideally, a different chip architecture, too. That’s why Intel is developing 2 different chips: each dedicated to their one respective aspect of deep learning.
A statue of Intel’s logo. Image courtesy of Flickr.
Training
The training part of the ML process is going to be tackled by Intel’s Nervana NNP-T chip (the ‘T’ even stands for ‘training’). This system on chip (SoC) supports up to 24 tensor processing clusters, on-die static RAM, a networking capability, 60MB of on-chip memory, and 4x8GB of high-efficiency memory (HBM2)—which result in up to 119 theoretical operations per second (TOPS).
The chip isn’t actually being produced by Intel itself: it is instead being outsourced to TSMC. It’s built on a 16nm manufacturing process, consists of 27 billion transistors on a 690 mm² die, and can reach core frequencies of up to 1.1Ghz. The SoC supports a x16 PCIe 4.0 connection, which allows for an aggregate bandwidth of 3.6 terabytes per second and is expected to consume between 150 and 250W of power.
The product is designed from the ground up to handle AI-specific tasks where it should greatly outperform current systems, but it won’t replace traditional processors for the majority of computing tasks. Instead, both the NPP-T and NNP-I chips are meant to work in tandem with existing CPUs.
Inference
To deal with the inference side of the equation, Intel is developing its Nervana NNP-I SoC (the ‘I’ stands for ‘inference’). This chip is being produced in-house, on Intel’s own 10-nanometre process. It supports up to 4 64GB LPDDR4x memory modules and features a shared 24MB L3 cache, two Icelake CPU cores, and 12 ICE (inference compute engine) cores that can work independently of one another.
Each ICE core is capable of delivering up to 4.8 TOPS. The SoC has a DRAM bandwidth of up to 68GB/s, can be plugged into an M.2 slot, and draws between 10 and 50W of power.
Developer Tools
Despite all of the above, having a powerful and efficient AI chip is worth nothing without good software support. That’s why Intel is working on optimising existing source libraries, developing their own deep learning library for Apache Spark and Hadoop clusters (called BigDL), and offering its own distribution of the OpenVINO toolkit, which should help optimise pre-trained ML models.
Intel’s goal is to make the transition to its hardware easy for developers: this is by ensuring that its NNP chips will be able to seamlessly integrate with existing tools and libraries.
Intel is already sourcing its chips to partners like Baidu and Facebook and is also planning on making its chipsets broadly available in 2020.
Even though Intel is one of the last major players to develop dedicated AI-processing hardware, its newest offerings might just be able to close the gap between itself and its early-bird competitors. After all, by developing both training and inference chips, Intel is offering a complete AI processing solution, which could—not least thanks to its decades worth of experience—sway a large portion of the market into opting for its chips.