close

#1. Using Hadoop and Spark on SPARC Servers / Solaris Platform – Configuring Hadoop Single Node Environment Part 2

Using Hadoop and Spark on SPARC Servers / Solaris Platform - Configuring Hadoop Single Node Environment

In Part One of this blog series, I provided instructions on configuring a Spark cluster environment using Hadoop YARN running on highly-reliable Fujitsu M10/Fujitsu SPARC M12 servers. By taking advantage of enterprise-grade, highly reliable hardware, customers can to avoid a Single Point of Failure (SPOF) in a Hadoop environment.

In Part Two of this blog, we provide step-by-step instructions on how to configure a Hadoop Single Node environment and will continue in subsequent blogs:

  • Part Two – Configure Hadoop single node environment
  • Part Three – Configure Hadoop cluster environment on Oracle VM for SPARC
  • Part Four – Spark cluster environment using Hadoop cluster

Creating directories for Hadoop

Create directories for storing Hadoop data. Each directory must be created as a ZFS file system for the future production operation.

Create a directory for log files of “hdfs” users.

# zfs create -p hdpool/log/hdfs
# chown hdfs:hadoop /hdpool/log/hdfs

Create a directory for log files of “yarn” users.

# zfs create -p hdpool/log/yarn
# chown yarn:hadoop /hdpool/log/yarn

Create a directory for log files of “mapred” users.

# zfs create -p hdpool/log/mapred
# chown mapred:hadoop /hdpool/log/mapred

Create a directory for the HDFS metadata.

# zfs create -p hdpool/data/1/dfs/nn
# chmod 700 /hdpool/data/1/dfs/nn
# chown -R hdfs:hadoop /hdpool/data/1/dfs/nn

Create a directory for the HDFS data blocks.

# zfs create -p hdpool/data/1/dfs/dn
# chown -R hdfs:hadoop /hdpool/data/1/dfs/dn

Create directories for “yarn” user.

# zfs create -p hdpool/data/1/yarn/local
# zfs create -p hdpool/data/1/yarn/logs
# chown -R yarn:hadoop /hdpool/data/1/yarn/local
# chown -R yarn:hadoop /hdpool/data/1/yarn/logs

Create runtime directories yarn for “yarn” user.

# zfs create -p hdpool/run/yarn
# chown yarn:hadoop /hdpool/run/yarn
# zfs create -p hdpool/run/hdfs
# chown hdfs:hadoop /hdpool/run/hdfs
# zfs create -p hdpool/run/mapred
# chown mapred:hadoop /hdpool/run/mapred

Create a directory for temporary data.

# zfs create -p hdpool/tmp

Confirm that all the directories have been created, as follows:

# zfs list -r hdpool

When the result is as shown below, they have been created successfully.

NAME                       USED  AVAIL  REFER  MOUNTPOINT
hdpool                    5.33M  11.7G   352K  /hdpool
hdpool/data               2.36M  11.7G   304K  /hdpool/data
hdpool/data/1             2.06M  11.7G   320K  /hdpool/data/1
hdpool/data/1/dfs          896K  11.7G   320K  /hdpool/data/1/dfs
hdpool/data/1/dfs/dn       288K  11.7G   288K  /hdpool/data/1/dfs/dn
hdpool/data/1/dfs/nn       288K  11.7G   288K  /hdpool/data/1/dfs/nn
hdpool/data/1/yarn         896K  11.7G   320K  /hdpool/data/1/yarn
hdpool/data/1/yarn/local   288K  11.7G   288K  /hdpool/data/1/yarn/local
hdpool/data/1/yarn/logs    288K  11.7G   288K  /hdpool/data/1/yarn/logs
hdpool/log                1.17M  11.7G   336K  /hdpool/log
hdpool/log/hdfs            288K  11.7G   288K  /hdpool/log/hdfs
hdpool/log/mapred          288K  11.7G   288K  /hdpool/log/mapred
hdpool/log/yarn            288K  11.7G   288K  /hdpool/log/yarn
hdpool/run                1.17M  11.7G   336K  /hdpool/run
hdpool/run/hdfs            288K  11.7G   288K  /hdpool/run/hdfs
hdpool/run/mapred          288K  11.7G   288K  /hdpool/run/mapred
hdpool/run/yarn            288K  11.7G   288K  /hdpool/run/yarn
hdpool/tmp                 288K  11.7G   288K  /hdpool/tmp

In this blog, the directories for log files are compressed by using ZFS function. As the size of log files increase, a disk space problem may be created. In order to prevent the processes from abnormally terminating due to failure to write to the disk, compress the log file directories as follows:

# zfs set compression=lz4 hdpool/log

Setting Hadoop Configuration Files

Change the current directory to the directory of Hadoop configuration files.(/opt/hadoop/etc/hadoop)
Add the followings to “hadoop-env.sh”.

export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export JAVA_HOME=/usr/java
export HADOOP_LOG_DIR=/hdpool/log/hdfs

Add the followings to “yarn-env.sh”.

export JAVA_HOME=/usr/java
export YARN_LOG_DIR=/hdpool/log/yarn
export HADOOP_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Add the followings to “mapred-env.sh”.

export JAVA_HOME=/usr/java
export HADOOP_MAPRED_LOG_DIR=/hdpool/log/mapred
export HADOOP_MAPRED_IDENT_STRING=mapred

Edit “slaves” and write hostname managed by “DataNode”.

m10spark

Edit “core-site.xml” as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
  <name>fs.defaultFS</name>
  <value>hdfs://m10spark</value>
</property>
</configuration>

Edit “mapred-site.xml” as follows:

<?xml version="1.0"?>
<configuration>
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
<property>
 <name>mapreduce.jobhistory.address</name>
 <value>m10spark:10020</value>
</property>
<property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>m10spark:19888</value>
</property>
<property>
 <name>yarn.app.mapreduce.am.staging-dir</name>
 <value>/user</value>
</property>
</configuration>

Edit “yarn-site.xml” as follows:

<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>m10spark</value>
</property>
<property>
  <name>yarn.nodemanager.local-dirs</name>
  <value>file:///hdpool/data/1/yarn/local</value>
</property>
<property>
  <name>yarn.nodemanager.log-dirs</name>
  <value>file:///hdpool/data/1/yarn/logs</value>
</property>
<property>
  <name>yarn.log.aggregation.enable</name>
  <value>true</value>
</property>
<property>
  <description>Where to aggregate logs</description>
  <name>yarn.nodemanager.remote-app-log-dir</name>
  <value>hdfs://hdpool/log/hadoop-yarn/apps</value>
</property>
</configuration>

Edit “hdfs-site.xml” as follows:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
  <name>dfs.datanode.data.dir</name>
  <value>/hdpool/data/1/dfs/dn</value>
 </property>
 <property>
  <name>dfs.namenode.name.dir</name>
  <value>/hdpool/data/1/dfs/nn</value>
 </property>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
 </property>
 <property>
  <name>dfs.permissions.supergroup</name>
  <value>hadoop</value>
 </property>
</configuration

Starting Hadoop Processes

WARNING: Although the message below may appear, it can be ignored, because Hadoop processes will not be affected.

xx/xx/xx xx:xx:xx WARN util.NativeCodeLoader: Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable

Format Hadoop file system, first.

# su - hdfs -c 'hdfs namenode -format'

The message below shows that it is formatted successfully.

16/10/24 08:17:36 INFO common.Storage: Storage directory /hdpool/data/1/dfs/nn has been successfully formatted.

The message below shows that it is formatted successfully.

Start “NameNode”.

# su - hdfs -c 'hadoop-daemon.sh start namenode'

Check “NameNode” status.

# /usr/java/bin/jps | grep NameNode

The message below indicates that it started successfully.

25852 NameNode

Start “ResourceManager”.

# su - yarn -c 'yarn-daemon.sh start resourcemanager'

Check “ResourceManager” status.

# /usr/java/bin/jps | grep ResourceManager

The message below indicates that it started successfully.

25982 ResourceManager

Start “DataNode” and “NodeManager”.

# su - hdfs -c 'hadoop-daemon.sh start datanode'
# su - yarn -c 'yarn-daemon.sh start nodemanager'

Check “DataNode” status.

# /usr/java/bin/jps | grep DataNode

The message below indicates that it started successfully.

26240 DataNode

Check “NodeManager” status.

# /usr/java/bin/jps | grep NodeManager

The message below indicates that it started successfully.

26299 NodeManager

Start “JobHistoryServer”.

# su - mapred -c 'mr-jobhistory-daemon.sh start historyserver'

Check “JobHistoryServer” status.

# /usr/java/bin/jps | grep JobHistoryServer

The message below indicates that it started successfully.

26573 JobHistoryServer

Create HDFS directories.

# (Login to name-node1 as "hdfs" user)
# hadoop fs -mkdir /tmp
# hadoop fs -chmod -R 1777 /tmp
# hadoop fs -mkdir /data
# hadoop fs -mkdir /data/history
# hadoop fs -chmod -R 1777 /data/history
# hadoop fs -chown yarn /data/history
# hadoop fs -mkdir /var
# hadoop fs -mkdir /var/log
# hadoop fs -mkdir /var/log/hadoop-yarn
# hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
# hadoop fs -mkdir /data/spark
# hadoop fs -chown spark /data/spark

Confirm that the HDFS directories can be accessed from all the Hadoop nodes, as follows:

# su - hdfs -c 'hadoop fs -ls -R /'

Confirm that the directories below are displayed on all nodes.

drwxr-xr-x   - hdfs  hadoop          0 2016-10-25 07:32 /data
drwxrwxrwt   - yarn  hadoop          0 2016-10-25 07:26 /data/history
drwxr-xr-x   - spark hadoop          0 2016-10-25 07:32 /data/spark
drwxrwxrwt   - hdfs  hadoop          0 2016-10-24 16:08 /tmp
drwxr-xr-x   - hdfs  hadoop          0 2016-10-25 07:29 /var
drwxr-xr-x   - hdfs  hadoop          0 2016-10-25 07:29 /var/log
drwxr-xr-x   - yarn  mapred          0 2016-10-25 07:29 /var/log/hadoop-yarn

Testing

Connect to http://<target IP address>:50070/ with a web browser. The page shown below means that Hadoop is working successfully.

overview m10 spark part2

Run the Hadoop sample application. When it ends successfully, all the procedures are completed.

# su - spark
$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 10 20
Number of Maps  = 10
Samples per Map = 20
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9

Starting Job

16/11/16 13:47:08 INFO client.RMProxy: Connecting to ResourceManager at m10spark/10.20.98.122:8032
16/11/16 13:47:10 INFO input.FileInputFormat: Total input paths to process : 10
16/11/16 13:47:10 INFO mapreduce.JobSubmitter: number of splits:10
16/11/16 13:47:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479271455092_0001
16/11/16 13:47:11 INFO impl.YarnClientImpl: Submitted application application_1479271455092_0001
16/11/16 13:47:11 INFO mapreduce.Job: The url to track the job: http://m10spark:8088/proxy/application_1479271455092_0001/
16/11/16 13:47:11 INFO mapreduce.Job: Running job: job_1479271455092_0001
16/11/16 13:47:25 INFO mapreduce.Job: Job job_1479271455092_0001 running in uber mode : false
16/11/16 13:47:25 INFO mapreduce.Job:  map 0% reduce 0%
16/11/16 13:47:51 INFO mapreduce.Job:  map 10% reduce 0%
16/11/16 13:47:53 INFO mapreduce.Job:  map 20% reduce 0%
16/11/16 13:47:54 INFO mapreduce.Job:  map 60% reduce 0%
16/11/16 13:48:14 INFO mapreduce.Job:  map 80% reduce 0%
16/11/16 13:48:15 INFO mapreduce.Job:  map 100% reduce 0%
16/11/16 13:48:17 INFO mapreduce.Job:  map 100% reduce 100%
16/11/16 13:48:18 INFO mapreduce.Job: Job job_1479271455092_0001 completed successfully
16/11/16 13:48:18 INFO mapreduce.Job: Counters: 49
  File System Counters
    FILE: Number of bytes read=226
    FILE: Number of bytes written=1311574
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=2590
    HDFS: Number of bytes written=215
    HDFS: Number of read operations=43
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=3
Job Counters
    Launched map tasks=10
    Launched reduce tasks=1
    Data-local map tasks=10
    Total time spent by all maps in occupied slots (ms)=241698
    Total time spent by all reduces in occupied slots (ms)=20138
    Total time spent by all map tasks (ms)=241698
    Total time spent by all reduce tasks (ms)=20138
    Total vcore-milliseconds taken by all map tasks=241698
    Total vcore-milliseconds taken by all reduce tasks=20138
    Total megabyte-milliseconds taken by all map tasks=247498752
    Total megabyte-milliseconds taken by all reduce tasks=20621312
Map-Reduce Framework
    Map input records=10
    Map output records=20
    Map output bytes=180
    Map output materialized bytes=280
    Input split bytes=1410
    Combine input records=0
    Combine output records=0
    Reduce input groups=2
    Reduce shuffle bytes=280
    Reduce input records=20
    Reduce output records=0
    Spilled Records=40
    Shuffled Maps =10
    Failed Shuffles=0
    Merged Map outputs=10
    GC time elapsed (ms)=2760
    CPU time spent (ms)=0
    Physical memory (bytes) snapshot=0
    Virtual memory (bytes) snapshot=0
    Total committed heap usage (bytes)=2024275968
Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=1180
    File Output Format Counters
    Bytes Written=97
Job Finished in 69.642 seconds
Estimated value of Pi is 3.12000000000000000000

Stay tuned for next blog, which will cover how to configure a Hadoop cluster environment on Oracle VM for SPARC. I welcome any comments and experiences you may have.


The information contained in this blog is for general information purposes only. While we endeavor to keep the information up-to-date and correct through testing on a practical system, we make no warranties of any kind about the completeness, accuracy, reliability, suitability or availability. Any reliance you place on such information is strictly at your own risk.

The information in this blog is subject to change without notice.

Tags: , , , , , , , ,

No Comments

Leave a reply

Post your comment
Enter your name
Your e-mail address

Before you submit your comment you must solve the following arithmetic function! * Time limit is exhausted. Please reload CAPTCHA.

Story Page