In our blogs about the Fujitsu M10 server, we have covered a lot of ground. But one thing we have not talked about is how the SPARC64 X+ processor is designed to handle more workloads, simultaneously, than other processors. It can also speed processing of single threads, typically faster than other processors.
Processor design is a pretty deep topic
Entire books are written on this topic, so we are just going to skim the surface and talk about the high points of the SPARC64 X+ processor design, and why it provides such great single and multi-thread performance, in simple terms.
First, let’s define a thread. A software thread is the smallest sequence of instructions that can be managed independently by the operating system scheduler. Threads are the parallel parts of a program, parts which do not directly depend upon other processes or are subsets of a larger process. Typically, each program process contains multiple threads, which can be executed independently from each other. However, some processes cannot be parallelized, which is why single thread performance is important.
The SPARC64 X+ processor is based on a 4-way superscalar architecture and has up to 16 cores per socket and two threads per core. A superscalar architecture means that the processor core can execute multiple instructions within a single clock cycle. In the case of the SPARC64 X+, each processor core can process four individual instructions on four Arithmetic and Logic Units (ALUs) during each cycle. There are also four Floating Point Units per core that can independently execute advanced mathematical functions. Both the ALUs and FPUs are discrete pieces of hardware within each core, which means they can execute tasks without time sharing or context switching (which is another way of time sharing within a core.)
Each SPARC64 X+ processor core also provides Out-Of-Order execution, which simply means that it processes instructions based on when the data needed is available, rather than going sequentially through the instructions in the program. This is a big time saver as available cycles are used to perform useful work, such as retrieving data for the next instruction or executing unrelated tasks.
The above attributes are referred to as Simultaneous Multithreading (SMT). This unique Fujitsu SPARC64 architecture is described in further detail in this Fujitsu whitepaper.
With the combined features from SMT, a 16-core SPARC64 X+ processor can execute 32 program threads concurrently per clock cycle. This is an advantage over typical 2-way superscalar processors that only possess two ALUs per core. With 2-way architectures, there may not be enough Out-Of-Order hardware resources to handle instructions for more than one thread at a time per clock cycle. Threads must compete within the core, often impacting single thread performance as the compute resource is not continuously available for each thread.
The Fujitsu SPARC64 SMT design radically reduces code wait time by ensuring that there is enough processor core hardware to truly handle single and multi-threaded workloads efficiently and in a balanced way. This is a fundamental strength of the SPARC64 X+ processor and one of the biggest factors fueling the performance of Fujitsu M10 systems.