Combating COVID-19 with supercomputer Fugaku (part 2)

Main visual : Combating COVID-19 with supercomputer Fugaku (part 2)

News


Contents


(Continued from part 1)

Supercomputer Fugaku available ahead of schedule

The novel coronavirus (COVID-19) continues to threaten public health and business. In April 2020, RIKEN teamed up with the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) to make the supercomputer Fugaku available for use in COVID-19 research and development.

RIKEN will provide technical support to the research and development projects selected by RIKEN and MEXT. These research projects will focus on: identifying the characteristics of COVID-19; finding compounds that can be used in treatment; and analyzing the spread of infection and its socio-economic impact.

Fugaku is a supercomputer jointly developed by RIKEN and Fujitsu (*1). Fugaku serves as the successor to the K computer, which was decommissioned in August 2019. The system is immense in scale, featuring over 150,000 high-performance A64FX CPUs developed by Fujitsu, all connected via the Tofu Interconnect D high-speed network.

Manufacturing work for Fugaku began in April 2019 at Fujitsu's plant in Ishikawa prefecture. Starting in December 2019, Fujitsu delivered hundreds of computer racks into the RIKEN Center for Computational Science in Kobe, Hyogo prefecture to build the system. Development and preparations were underway to commence shared use of the system in FY2021. However, in response to the immediate need created by the COVID-19 pandemic, the schedule was shifted forward to provide as much of Fugaku's computational resources as possible under an advance, priority arrangement. Now, the computational capabilities of this world-class supercomputer are being enlisted to fight COVID-19.

Eight times the computational capacity

With RIKEN, Fujitsu was installing equipment as well as building and preparing the operational environment for Fugaku. When plans changed to provide computational resources ahead of schedule, the team was uncertain of whether the system would be stable because there were still more computer racks to bring in and configure. However, the team members were motivated by a sense of purpose to help fight COVID-19 given the enormity of the crisis. They worked to prepare the system so that it could serve the needs of research and development projects even with an early release.

Supercomputers are complex systems made up of many elements. First is the hardware layer, which includes CPU, memory, storage, and network devices. The operating system (OS) and middleware run on the hardware, and serve as an intermediary for applications used in analysis and other activities. Machines also need ancillary equipment such as cooling systems and power supplies. This makes supercomputers large-scale machines with many different functional parts.

To achieve system-wide stability, every single element must run well. Functions to monitor operational status play a crucial role. It is vital to build an environment that constantly monitors the system's status so that it keeps running without downtime.

Fujitsu has implemented monitoring features using operational management software, and worked to link the system with a 24-hour monitoring service that lets them check the status of all CPUs. Fujitsu also established key operational procedures and automated tasks, such as for restoring the system in the event of a failure.

Through these efforts, trial usage commenced in April 2020 with research and development initiatives that implemented some nodes of Fugaku (with computational capabilities equivalent to about four to eight times that of the K computer).

Preventing COVID-19 infection

While the project to bring Fugaku online would normally pose no issues, the need for social distancing and infection control added complications. If anyone on site had contracted COVID-19, the project could be brought to a standstill.

A large number of on-site staff members from many different departments worked together in close cooperation while striving to prevent infection. Fujitsu personnel were involved in logistics, installation work, testing, and adjustments. Meanwhile, system engineers worked to develop and finetune the operating system, middleware and other software, as well as build and prepare the operational environment.

To prevent infection, staff members avoided physical contact between individuals. For example, workers were assigned to separate rooms, and multiple routes were created by dividing the computer building into east and west blocks. In addition, personnel adopted staggered working hours, and some worked remotely to help prevent crowding on site. Fujitsu consulted with RIKEN to ensure full cooperation in diligently carrying out these and other infection-prevention activities. Additional personnel were on call to replace original staff members should they infected .

Photo : Photo of Fugaku during installation (installation was completed inMay)

Photo of Fugaku during installation (installation was completed in May)

The Fujitsu Group has promoted teleworking initiatives since 2015, so almost all employees were familiar with this work method. As a result, staff members could conduct their work tasks without issue despite the exceptional circumstances.

Everyone involved in the project came together to provide assistance throughout the supply chain. Personnel at the plant that manufactured the equipment, parts suppliers, and logistics and procurement divisions all played key roles.

ICT helps battle COVID-19

Fujitsu’s ICT strengths are significant, from manufacturing CPUs and other types of hardware to software development and system integration. Fujitsu aims to help solve social issues using ICT, and this mindset propelled our efforts to take on the challenge of providing Fugaku's computational assets ahead of schedule.

Fujitsu places the utmost priority on preventing the spread of infection and maintaining the safety of customers, suppliers, employees, and their families. At the same time, we continue to offer products and services to customers and pursue initiatives that help resolve the wide range of social issues that have arisen due to the pandemic.

(*1) About the supercomputer Fugaku:

A supercomputer is a machine that can perform a vast number of calculations at ultra-high speed. Fugaku is the successor to the K computer. The name K is derived from the Japanese word kei, which means 10 quadrillion (10 to the 16th power), as it has the power to execute that many calculations per second. The K computer ranked first in supercomputer performance in the Top 500 in June and November 2011. It also maintained the first-place position on eight nine consecutive occasions since 2014 in the Graph500. With the goal of an application execution performance 100 times that of the K computer, Fugaku integrates Fujitsu’s knowledge and experience in computer development, which started with the company’s FACOM series of computers.

Fugaku is another name for Mt. Fuji, the tallest mountain in Japan. The name symbolizes the supercomputer’s high performance and indicates the broad domains that the new supercomputer will help its users reach, as suggested by Mt. Fuji’s wide conical shape. Fugaku will help solve social issues such as those in healthcare, disaster prevention, energy, and sustainability. The supercomputer will also play a role in the development of manufacturing methods based on entirely new concepts, in unraveling the mysteries of the universe and the origin of life, and in AI and robotics research projects that concern these topics. As the culmination of world-leading technology, Fugaku will continue to assist in many more efforts to come.
On the K computer, it was not easy to use conventionally distributed software as-is. The A64FX CPUs in Fugaku, however, implement an architecture using the ARM instruction set with extensions for supercomputers. These CPUs are compatible with the widely used Linux OS, which helps Fugaku overcome the issues of K. Fujitsu aims to significantly boost application power efficiency by delivering up to 100 times the application execution performance of K while consuming only about three times the amount of electricity.