When I started my career in IT, SPARC / Solaris was the platform of choice for system development, from small-scale systems to mission critical. Most open source software (OSS) was also first developed on Solaris and then migrated to other platforms.
In recent years the performance and reliability of the Intel Architecture (IA) platform have grown higher, and most OSS has been developed using IA / Linux, but SPARC / Solaris continues to evolve. Integrated development of OS and hardware create strong advantages for each platform. For example, in Fujitsu M10 / Fujitsu SPARC M12, as CPU functions such as SWoC (Software on Chip) are enhanced, extensions to automatically use SWoC functions are implemented in Solaris OS. This results in higher reliability of Fujitsu M10 / Fujitsu SPARC M12 hardware to ensure stable operation of the IT system. In addition, many OSS have been optimized for the latest version of Solaris, which makes stable OSS operation possible.
Oftentimes, I receive inquiries about the combination of SPARC / Solaris and OSS from customers who understand the advantages of SPARC / Solaris and continue to use it. In this blog, I explore the use of SPARC / Solaris in a wider range of fields.
Big Data Processing & Solaris
Open source software such as Hadoop and Spark are often used for big data processing, but many people may not know that Hadoop and Spark work not only on Linux OS but also on Solaris.
In this blog, I will explain how to configure a Spark cluster environment using Hadoop YARN to focus on the advantages of Solaris. This blog series consist of the following sections.
- Configure Hadoop single node environment
- Configure Hadoop cluster environment on Oracle VM for SPARC
- Spark cluster environment using Hadoop cluster
I don’t go into a detailed explanation of Hadoop and Spark. Please refer to the following as appropriate.
Advantages of Running Hadoop on SPARC / Solaris: Configuring Hadoop Single Node Environment
Information on Hadoop states that “a reliability of individual nodes is unnecessary since the data is distributed and stored”.
While it’s correct that the data is duplicated and stored in multiple nodes, NameNode, which keeps the directory tree of all files in the file system, is limited to two. Thus, Namenode becomes the Single Point of Failure (SPOF) of Hadoop. In a development environment, operations may not be affected if developers have to rebuild NameNode, should it fail. But in a business environment, any downtime – even a short interruption – could have a negative impact. One option is to use highly reliable hardware like Fujitsu M10 / Fujitsu SPARC M12 servers for nodes that run NameNode and ResourceManager.
The data volume for Hadoop should be configured as a dedicated ZFS storage pool separately from the normal system volume (rpool). This makes it possible to perform operations such as adding disks independently from the system volume. You can create high-speed, high-capacity storage in Fujitsu M10 / Fujitsu SPARC M12 by using flashcards instead of internal disks.
Testing configuration is as follows:
- Server Fujitsu M10-1
- OS Solaris11.3 SRU 10.7
- Java jdk 1.7.0_111
- Hadoop 2.7.3
Preparing Hadoop Installation
The OS is installed in a primary domain of a Fujitsu M10 server. Please refer to the Solaris manual for OS installation and initial setting.
http://docs.oracle.com/cd/E53394_01/ The hostname of this system is set as “m10spark”.
Next, run the following commands as “root” user.
Configure ZFS storage pool to store Hadoop data. The pool name is “hdpool”.
# zpool create hdpool <devices for hadoop data>
By default, Java 8 is installed on Solaris 11, but since the latest version of Java on which Hadoop was tested is Java 7, jdk 7 package should be installed. Associated packages will also be installed. Please note that jdk should be installed as well as jre for using “jps” command to check the Hadoop processed status. Run “pkg install” with the “–accept” option for license agreement.
# pkg install --accept developer/java/jdk-7
In general, just after installing jdk7, the default Java version remains Java8. Switch Java versions with the following procedure.
At first, check the current version of Java.
# java -version java version "1.8.0_102" Java(TM) SE Runtime Environment (build 1.8.0_102-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
Confirm that two types of Java are installed.
# pkg mediator -a java
MEDIATOR VER. SRC. VERSION IMPL. SRC. IMPLEMENTATION java system 1.8 system java system 1.7 system
Switch Java version to Java7.
# pkg set-mediator -V 1.7 java
Confirm that Java version is Java 7.
# java -version java version "1.7.0_111" Java(TM) SE Runtime Environment (build 1.7.0_111-b13) Java HotSpot(TM) Server VM (build 24.111-b13, mixed mode)
Edit “/etc/hosts” file to set IP address of local host.
::1 localhost 127.0.0.1 localhost loghost xxx.xxx.xxx.xxx m10spark m10spark.local
Add the Hadoop group ID as “hadoop”. The group ID of all additional Haddop users should be “hadoop”.
# groupadd -g 200 hadoop
Add the user ID for running “NameNode” and “DataNode” as “hdfs”, and set the password.
# useradd -u 200 -m -g hadoop hdfs # passwd hdfs
Add the user ID for running “ResourceManager” and “NodeManager” as “yarn”, and set the password.
# useradd -u 201 -m -g hadoop yarn # passwd yarn
Add the user ID for running “History Server” as “mapred”, and set the password.
# useradd -u 202 -m -g hadoop mapred # passwd mapred
Add the user ID for running user program as “spark”, and set the password.
# useradd -u 101 -m -g hadoop spark # passwd spark
Download Hadoop, and transfer to the node to install.
# cd /opt # <Extracting Hadoop archive> # ln -s hadoop-2.7.3 hadoop
Change the owner of Hadoop files to root, and chang the group ID to hadoop. All permissions of files are “755”.
# chown -R root:hadoop /opt/hadoop-2.7.3 # chmod -R 755 /opt/hadoop-2.7.3
Hadoop uses SSH for connecting to each Hadoop process even in a single node. So, all users should set up public and private key pair for SSH passphraseless authentication.
# su - hdfs # ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa # cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys # logout # su - yarn # ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa # cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys # logout # su - mapred # ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa # cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys # logout
Confirm that each user can establish SSH communication to localhost.
Environment variables for running Hadoop should be set for each user.
Set the following environment variables in “$HOME/.profile” of “hdfs” user.
export JAVA_HOME=/usr/java export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin export HADOOP_HOME=/opt/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_PID_DIR=/hdpool/run/hdfs export HADOOP_GROUP=hadoop
Set the following environment variables in “$HOME/.profile” of “yarn” user.
export JAVA_HOME=/usr/java export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin export HADOOP_HOME=/opt/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_PID_DIR=/hdpool/run/yarn
Set the following environment variables in “$HOME/.profile” of “mapred” user.
export JAVA_HOME=/usr/java export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin export HADOOP_HOME=/opt/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export MAPRED_PID_DIR=/hdpool/run/mapred
Set the following environment variables in “$HOME/.profile” of “spark” user.
export JAVA_HOME=/usr/java export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin export HADOOP_HOME=/opt/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
At this point, we have completed the initial configurations. In the second part of this blog, we will configure a Hadoop single node environment.
The information contained in this blog is for general information purposes only. While we endeavor to keep the information up-to-date and correct through testing on a practical system, we make no warranties of any kind about the completeness, accuracy, reliability, suitability or availability. Any reliance you place on such information is strictly at your own risk.
The information in this blog is subject to change without notice.