• Home
  • Servers
  • Building the Ultimate Supercomputer with 8,000+ General-Purpose Servers! Large-Scale PC Cluster Configuration Technology (Part 2)

Building the Ultimate Supercomputer with 8,000+ General-Purpose Servers! Large-Scale PC Cluster Configuration Technology (Part 2)

[The Power to Create the Future Vol. 12]

Main visual : Building the Ultimate Supercomputer with 8,000+ General-Purpose Servers! Large-Scale PC Cluster Configuration Technology (Part 2)

(Continued from Part 1.)
In the Oakforest-PACS project, Nakashima's research and development group gained confidence about their large-scale PC cluster configuration technology by recording better performance than rival teams from other countries with big budgets. To Nakashima, what is a team that creates innovations?

People Inherit Skills to Keep Going

Nakashima: "Fujitsu's Business Division has always been extremely strict about stability and defects, and they pursue problems thoroughly. In the Oakforest-PACS project, we identified and solved defects one by one that could be discovered only when the number of nodes was scaled up to some 8,000, which I think was a large driving force for the good results in the end.

Accumulated skills are great assets. I have done research and enthusiastically played the role of using and developing skills while sharing in the passion of my senior colleagues who were doing research and development even before I joined Fujitsu. The research team members are gradually being replaced with a younger generation that is about to take over.

A large volume of patent documents concern PC cluster development. These collections of printed documents suggest how much research was conducted.

Research and development is done to accumulate small skills. It is sometimes impossible to thoroughly document or program how to use tools or how to read results; therefore, people often have to pass down such skills to the next generation while discussing how to do something under specific circumstances. It will probably be disastrous if we fail to pass down the skills used in today's society. Although passing down skills to the next generation is a heavy responsibility, I think this kind of pressure was one reason why we could complete the Oakforest-PACS project instead of giving up."

Do Not Flinch at High Goals and Never Cease Making Efforts

"What is needed for a team to create innovations is to not flinch at high goals. People tend to think that it is sufficient to achieve a mediocre goal that satisfies everyone. Once they are convinced that there is no way to achieve a high goal, they will never be able to reach said goal. The key to proceeding to the next step is to set remarkable, significant goals and to be enthusiastic enough to achieve such goals. You should never cease making efforts towards the goals. Even home run sluggers make efforts behind the scenes, though the audience only sees such efforts when they manifest as home runs."

The Joy of Learning from Predecessors and Thinking Things Through

"My superior gave me plenty of time to consider the big theme of 'how to make a large-scale network function in a sophisticated way.'
Taking water flow as an analogy for network data transfer, when water flows from two pipes to one, only one pipe's worth of water can flow, resulting in a slower incoming water flow. Deadlocks in which water circulates without flowing anywhere else should also be prevented. This is a classic problem in the computing world, but it was fun to search for better solutions to this problem and the ideal solution for each PC cluster by drawing many diagrams (photo below) while looking up various solutions found by predecessors in textbooks.

Although finding something new is important in research, what I emphasize the most is that the cluster works and performs as expected as well as to find out how to achieve that. If a well-known solution can solve the problem, we can use that solution. However, well-known solutions may not work in reality, which is a research challenge. The group I belong to has been tackling such challenges."

Missions of Cluster Technology and the Post-K Computer

"We have been mobilizing all resources to develop the K computer, a non-cluster supercomputer, and the Post-K computer, a supercomputer in development, entirely from scratch, including the processors, by using the best technologies available at the moment. Our roles are to pursue notably high performance and to develop humanity's next technologies, whereas the role of reasonably priced PC clusters is to spread acquired technologies throughout the world.

I am often asked about the significance of high performance. The answer is that the higher the performance, the wider the scope of application. My role is to support research and innovation by, for example, visiting customers to actually perform a simulation for new scientific discoveries and to make the unachievable achievable."

Applying PC Clusters to AI Processing for New Deep Learning

"Since Moore's Law is no longer valid and it is unlikely that CPU performance will improve as it did before, domain-specific computing, or enhancing performance only for specialized processes, is increasingly important. Fujitsu is developing an AI supercomputer that is specialized for specific purposes by removing unnecessary functions. Thus, the challenge is to maximize performance with limited functions.

I am now studying how to speed up AI processing by applying PC cluster configuration technologies that I have developed to bring out the performance of AI-oriented processors. I aim for higher performance by combining the latest hardware being developed by various vendors, particularly through the use of a large-scale environment.

When a scale that is too large no longer prevents use of deep learning instead of causing us to abandon the idea, some people will probably do something that no one has ever imagined before. I think supporting this is a way of contributing to society."

Making Constant Efforts in the Age of Rapid Changes

I want to send this message to those who are considering creating innovations or trying new things now.

"Based on my experience, I want to emphasize the importance of trying even seemingly insignificant things to gain experience. Try even things that are cumbersome, take some time, and require effort. This way, you will see something that you have never imagined logically and encounter new challenges. These will grow into a lump and lead to a major undertaking. This should not be forgotten in this rapidly, drastically changing society that is making it more and more difficult to make such efforts."

"It takes patience to examine programs that someone else wrote. I like searching for and identifying defects, but if someone hears about this, they will think I am a weirdo!"

Global Engineers as the Basis of Innovation

"My goal or dream is to globally spread computing technologies, especially AI and performance improvement technologies. Partly due to market problems, Fujitsu's technologies have not spread globally yet.
If adopted not only by Japanese engineers but others around the world, our technologies will become a more efficient foundation for creating the next innovations. Moreover, when I travel to a foreign country for an academic conference, I will be able to enter that country smoothly merely by saying 'Fujitsu' to the immigration inspector without telling them about the purpose of my visit because they will already know about Fujitsu!" (laughs)

Kohta Nakashima

Project Director, Advanced Computer Systems Project, Computer Systems Laboratory, Fujitsu Laboratories Ltd.

Kohta Nakashima graduated from the Department of Electrical Engineering and Computer Science of Kyushu University's School of Engineering in 2000 and earned his master's degree there in 2002.
That same year, he joined Fujitsu Laboratories Ltd.
He has since engaged in research and development of technologies to build faster PC clusters and to control high-speed networks.
He obtained his Ph.D. in engineering from Okayama University and received a prize along with four other researchers in the category of "Development of Configuration Technology for Large-Scale Clusters" in the 2018 Commendation for Science and Technology from Japan's Ministry of Education, Culture, Sports, Science, and Technology.