close

Using Hadoop and Spark on SPARC Servers / Solaris Platform – Configuring Hadoop Single Node Environment Part 1

Using Hadoop and Spark on SPARC Servers / Solaris Platform - Configuring Hadoop Single Node Environment Part 1

When I started my career in IT, SPARC / Solaris was the platform of choice for system development, from small-scale systems to mission critical. Most open source software (OSS) was also first developed on Solaris and then migrated to other platforms.

In recent years the performance and reliability of the Intel Architecture (IA) platform have grown higher, and most OSS has been developed using IA / Linux, but SPARC / Solaris continues to evolve. Integrated development of OS and hardware create strong advantages for each platform. For example, in Fujitsu M10 / Fujitsu SPARC M12, as CPU functions such as SWoC (Software on Chip) are enhanced, extensions to automatically use SWoC functions are implemented in Solaris OS. This results in higher reliability of Fujitsu M10 / Fujitsu SPARC M12 hardware to ensure stable operation of the IT system. In addition, many OSS have been optimized for the latest version of Solaris, which makes stable OSS operation possible.

Oftentimes, I receive inquiries about the combination of SPARC / Solaris and OSS from customers who understand the advantages of SPARC / Solaris and continue to use it. In this blog, I explore the use of SPARC / Solaris in a wider range of fields.

Big Data Processing & Solaris

Open source software such as Hadoop and Spark are often used for big data processing, but many people may not know that Hadoop and Spark work not only on Linux OS but also on Solaris.

In this blog, I will explain how to configure a Spark cluster environment using Hadoop YARN to focus on the advantages of Solaris. This blog series consist of the following sections.

  1. Configure Hadoop single node environment
  2. Configure Hadoop cluster environment on Oracle VM for SPARC
  3. Spark cluster environment using Hadoop cluster

I don’t go into a detailed explanation of Hadoop and Spark. Please refer to the following as appropriate.

Advantages of Running Hadoop on SPARC / Solaris: Configuring Hadoop Single Node Environment

Information on Hadoop states that “a reliability of individual nodes is unnecessary since the data is distributed and stored”.

While it’s correct that the data is duplicated and stored in multiple nodes, NameNode, which keeps the directory tree of all files in the file system, is limited to two. Thus, Namenode becomes the Single Point of Failure (SPOF) of Hadoop. In a development environment, operations may not be affected if developers have to rebuild NameNode, should it fail. But in a business environment, any downtime – even a short interruption – could have a negative impact. One option is to use highly reliable hardware like Fujitsu M10 / Fujitsu SPARC M12 servers for nodes that run NameNode and ResourceManager.

System Configuration

The data volume for Hadoop should be configured as a dedicated ZFS storage pool separately from the normal system volume (rpool). This makes it possible to perform operations such as adding disks independently from the system volume. You can create high-speed, high-capacity storage in Fujitsu M10 / Fujitsu SPARC M12 by using flashcards instead of internal disks.

Testing configuration is as follows:

  • Server          Fujitsu M10-1
  • OS                Solaris11.3 SRU 10.7
  • Java             jdk 1.7.0_111
  • Hadoop       2.7.3

Preparing Hadoop Installation

OS installation

The OS is installed in a primary domain of a Fujitsu M10 server. Please refer to the Solaris manual for OS installation and initial setting.
http://docs.oracle.com/cd/E53394_01/ The hostname of this system is set as “m10spark”.

Next, run the following commands as “root” user.

Configuring ZFS Storage Pool

Configure ZFS storage pool to store Hadoop data. The pool name is “hdpool”.

# zpool create hdpool <devices for hadoop data>

Required packages installation

By default, Java 8 is installed on Solaris 11, but since the latest version of Java on which Hadoop was tested is Java 7, jdk 7 package should be installed. Associated packages will also be installed. Please note that jdk should be installed as well as jre for using “jps” command to check the Hadoop processed status. Run “pkg install” with the “–accept” option for license agreement.

# pkg install --accept developer/java/jdk-7

In general, just after installing jdk7, the default Java version remains Java8. Switch Java versions with the following procedure.

At first, check the current version of Java.

# java -version
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

Confirm that two types of Java are installed.

# pkg mediator -a java
MEDIATOR     VER. SRC. VERSION IMPL. SRC. IMPLEMENTATION
java         system    1.8     system
java         system    1.7     system

Switch Java version to Java7.

# pkg set-mediator -V 1.7 java

Confirm that Java version is Java 7.

# java -version
java version "1.7.0_111"
Java(TM) SE Runtime Environment (build 1.7.0_111-b13)
Java HotSpot(TM) Server VM (build 24.111-b13, mixed mode)

Editing /etc/hosts

Edit “/etc/hosts” file to set IP address of local host.

::1             localhost
127.0.0.1       localhost loghost
xxx.xxx.xxx.xxx    m10spark m10spark.local

Adding Hadoop Users and Group

Add the Hadoop group ID as “hadoop”. The group ID of all additional Haddop users should be “hadoop”.

# groupadd -g 200 hadoop

Add the user ID for running “NameNode” and “DataNode” as “hdfs”, and set the password.

# useradd -u 200 -m -g hadoop hdfs
# passwd hdfs

Add the user ID for running “ResourceManager” and “NodeManager” as “yarn”, and set the password.

# useradd -u 201 -m -g hadoop yarn
# passwd yarn

Add the user ID for running “History Server” as “mapred”, and set the password.

# useradd -u 202 -m -g hadoop mapred
# passwd mapred

Add the user ID for running user program as “spark”, and set the password.

# useradd -u 101 -m -g hadoop spark
# passwd spark

Hadoop Installation

Download Hadoop, and transfer to the node to install.
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
Install Hadoop.

# cd /opt
# <Extracting Hadoop archive>
# ln -s hadoop-2.7.3 hadoop

Change the owner of Hadoop files to root, and chang the group ID to hadoop. All permissions of files are “755”.

# chown -R root:hadoop /opt/hadoop-2.7.3
# chmod -R 755 /opt/hadoop-2.7.3

Setting up SSH

Hadoop uses SSH for connecting to each Hadoop process even in a single node. So, all users should set up public and private key pair for SSH passphraseless authentication.

# su - hdfs
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout
# su - yarn
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout
# su - mapred
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout

Confirm that each user can establish SSH communication to localhost.

Setting Up Environment Variables

Environment variables for running Hadoop should be set for each user.
Set the following environment variables in “$HOME/.profile” of “hdfs” user.

export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PID_DIR=/hdpool/run/hdfs
export HADOOP_GROUP=hadoop

Set the following environment variables in “$HOME/.profile” of “yarn” user.

export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_PID_DIR=/hdpool/run/yarn

Set the following environment variables in “$HOME/.profile” of “mapred” user.

export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export MAPRED_PID_DIR=/hdpool/run/mapred

Set the following environment variables in “$HOME/.profile” of “spark” user.

export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

At this point, we have completed the initial configurations. In the second part of this blog, we will configure a Hadoop single node environment.


The information contained in this blog is for general information purposes only. While we endeavor to keep the information up-to-date and correct through testing on a practical system, we make no warranties of any kind about the completeness, accuracy, reliability, suitability or availability. Any reliance you place on such information is strictly at your own risk.

The information in this blog is subject to change without notice.

Tags: , , , , , , , ,

No Comments

Leave a reply

Post your comment
Enter your name
Your e-mail address

Before you submit your comment you must solve the following arithmetic function! * Time limit is exhausted. Please reload CAPTCHA.

Story Page