How we are processing fast enough for AI

How we are processing fast enough for AI

In the future, when we look back to this point in time, we will most likely remember it as a crucial phase in the development of artificial intelligence (AI) – the tipping point when theoretically possible applications went mainstream. Artificial intelligence is certainly building momentum – there’s a tremendous amount of software development underway in this field and a great deal of excitement about its potential.

The key to unlocking the potential of machine learning systems is the ability to feed them the massive amount of data they need to learn. Thanks in part to the proliferation of IoT sensors and the vast amounts of information they capture; and partly thanks to the huge volume of text and image data available online we finally have enough data available to be able to train these AI systems to perform almost any task we want. This is important, because it is actually the training phase that is the most challenging aspect for neural networks, particularly from the perspective of the processing power they require.

Let us first take a quick look at the structure of a neural network to understand why this is

Neural networks can be implemented by taking a “divide and conquer” approach – that is, splitting a complex problem into many smaller tasks that run simultaneously. Modelled loosely on a human brain, the network is made up of processing elements, or nodes, and the signals between its layers can either flow one way or can be bidirectional. These highly interconnected processing elements (neurons) work in unison to learn.

The nodes receive inputs and provide outputs through their connections. It is this interaction that defines the behavior of the whole network. Much of a network’s training phase involves fine tuning the signal strengths of these virtual neurons (output) and determining their sensitivity (defining when they should fire on input). The training phase consists of adjusting and re-adjusting the sensitivity and the weights of the nodes to deliver the best answers.

Unlike most traditional computing structures, today’s neural networks are not implemented on massive, complex central processors. Rather they comprise many simple processors that respond collectively to the pattern of inputs they receive. Learning involves presenting the network with new input patterns, which are then processed accordingly in a distributed, parallel fashion.

In a simple model, the signal enters on one side and is then passed from layer to layer in the network. More complex networks employ back propagation – which means that signals are passed through to a certain depth in the layers, but are also sent back to previous layers with more information. Each cycle increases the computational demands on the system.

This explains why the task of training networks is so computationally intensive, in particular, if we increase the number of nodes, connections and layers. It also helps us appreciate the value of pre-trained networks, since re-training is both much easier and faster than the initial training process. But we’ve also uncovered one of the key technology challenges to AI and neural networks, which is how to keep pace with its voracious processing requirements of ever more complex networks.

Of course, a fast computer processor helps but pure power isn’t always enough. We have to build a system around the processors in a way that ensures that the processors can be fed fast enough and that their output can be combined rapidly. Essentially, the foundation for effective processing is excellent systems design for distributed and parallel processing.

Fujitsu has an unrivalled pedigree in this field

In 2011, the company built a supercomputer called “K”, a homonym for a Japanese word meaning 10 to the power of sixteen. K is based on a distributed memory architecture with over 80,000 computer nodes, deployed to solve problems in multiple different fields including climate research, disaster prevention and medical research. When it was built, K was the world’s fastest computer. While it has since been overtaken by a select few, it is still the leader in multiple processing benchmarks thanks to all the elements that Fujitsu has been able to fine-tune and optimize in the context of its original system design.

When it comes to the processors, an approach using traditional central processing units (CPU) doesn’t cut it in the field of neural networks. Rather, they require the parallel processing capabilities of graphics processing units (GPUs) which are manufactured by companies such as NVidia. By enabling parallel processing, a neural network’s algorithms can be accelerated. Unfortunately, after a certain point, having more parallel capability no longer improves the function, since the system is limited by the speed at which information flows between the separate chips or parts of chips.

Fujitsu’s innovative approach to work around this challenge involves enabling larger neural networks to exist on each individual processing element. This has been made possible, firstly by streamlining the memory efficiency of each GPU, and then by deploying algorithms that make the calculations performed by the neural networks more efficient. This approach has successfully doubled the speed of learning of neural networks based on widely deployed AlexNet and VGGNet research networks.

However, in some types of neural networks, particularly very large ones, optimizing the hardware and the software still isn’t enough. There’s an approach to processing in hardware that can directly execute without needing software. No translation is required, the hardware just handles the signal processing extremely quickly. The challenge is that, to be useable, the hardware must be reconfigurable. And this is where the hardware class of field programmable gate arrays (FPGAs) comes into play. Fujitsu also has significant expertise in making these chips and optimizing their capabilities. For example, one recent implementation of Fujitsu FPGAs was shown to be 10,000 times faster than conventional computers when addressing certain types of problems.

To ensure that we continue to process at speeds fast enough for AI, at Fujitsu we are approaching the challenge from several directions. While each enhancement makes a difference, in combination and in the context of a well-designed system, they can be nothing less than transformative.

AI is at an exciting place right now – and Fujitsu is actively contributing to the cause in various ways – from the experience with the K supercomputer, through our unrivalled parallel computing algorithms, to special hardware design. However, it is fundamentally our expertise in building efficient systems that is the key to the next generation of artificial intelligence. Now that we have optimized all aspects of systems made for the AI challenge, the next logical step could be creating dedicated hardware and processor architectures specifically for deep learning.

Watch this space.

Tags: , , , ,

No Comments

Leave a reply

Post your comment
Enter your name
Your e-mail address

Before you submit your comment you must solve the following arithmetic function! * Time limit is exhausted. Please reload CAPTCHA.

Story Page