Skip to content

Latest commit

 

History

History
250 lines (169 loc) · 9.94 KB

SetupHadoop.md

File metadata and controls

250 lines (169 loc) · 9.94 KB

Setup Hadoop

  • Version here: 3.2.1 (3.1.1)

Download Hadoop

We first download Hadoop from official website to temp_files/

fab download-hadoop

If ERROR 404: Not Found then go to mirror address and see what's the latest version

HADOOP_MIRROR = 'http://ftp.mirror.tw/pub/apache/hadoop/common' # Taiwan mirror
HADOOP_MIRROR = 'http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common' # China Tsinghua mirror

Configure

(if download different version, you should modify some settings in fabfile.py)

  • HADOOP_VERSION

Install Hadoop

(to see more message use -v flag)

fab install-hadoop

Setup hadoop user name and group

  1. Add hadoop group
  2. Add hadoop user to hadoop group
  3. Add hadoop user to sudo group

Generate ssh key for hadoop user

  1. Generate ssh key and corresopnding authorized_keys in local (./temp_files/hadoopSSH)
  2. Upload keys to remote by using hadoop user
    • it will first remove files to make sure it's the newest version

Upload, unpack and change owner

  1. Upload hadoop-3.1.1.tar.gz to each node
  2. Extract it
  3. Move to /opt
  4. Change owner to hadoop group and user
# SerialGroupt.put() is still pending
# https://github.com/fabric/fabric/issues/1800
# https://github.com/fabric/fabric/issues/1810
# (but it is in the tutorial... http://docs.fabfile.org/en/2.4/getting-started.html#bringing-it-all-together)

Setup environment variable

  1. Append setting in /etc/bash.bashrc
  2. Source it to apply setting
  3. Also configure in $HADOOP_HOME/etc/hadoop/hadoop-env.sh
    • JAVA_HOME
    • HADOOP_HEAPSIZE_MAX

Setup slaves (worker)

  1. Add master hostname address in $HADOOP_HOME/etc/hadoop/master
  2. Add slaves hostname address in $HADOOP_HOME/etc/hadoop/workers

Ps. in Hadoop 2.x it is $HADOOP_HOME/etc/hadoop/slaves

Update hadoop configuration

Use update-hadoop-conf

Setup HDFS

  1. Make directories and change their owner
  2. Format HDFS namenode

Update hadoop configuration files

Update these files to the remotes: (will overwrite the original content)

  • core-site.xml
  • mapred-site.xml
  • hdfs-site.xml
  • yarn-site.xml
fab update-hadoop-conf

Fix native library error

(use -c flag to clean up tar and build file when it's finished)

fab fix-hadoop-lib

THIS METHOD IS STILL IN TEST PHASE

TODO

  • fix IP (maybe)
  • make another .md about how to configure hadooop configure file
    • node number dynamic config (hostname or something else)

Trouble Shooting

Fix library (Cannot allocate memory)

Preface: I've test my hadoop by official example. But I can't finish the calculate-PI example even it reach 100% map 100% reduce. But I can successful running wordcount. It's wierd. So I tried to solve the library problem see if these can solve it.

Java HotSpot(TM) Client VM warning: You have loaded library /opt/hadoop-3.1.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Java HotSpot(TM) Client VM warning: INFO: os::commit_memory(0x52600000, 104861696, 0) failed; error='Cannot allocate memory' (errno=12)
# I've tried to fix it by execstack but it doesn't work
sudo apt-get install execstack
sudo execstack -c /opt/hadoop-3.1.1/lib/native/libhadoop.so.1.0.0
# Some other said that they can fix it by adding these two environment variable in hadoop-env.sh but still doesn't work
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/"
export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native/"
# Get hadoop build tools
sudo apt-get install maven libssl-dev build-essential pkgconf cmake
# Get protobuf build tools
sudo apt-get install -y autoconf automake libtool curl make g++ unzip

I've tried to install libprotobuf10(latest version) and protobuf-compiler by apt-get but it get the error... (libprotobuf8 didn't found)

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.1.1:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.0.0', expected version is '2.5.0'

Build Protocol Buffers: It has to be v2.5.0

Follow Official C++ Installation of Protocol Buffers. Build protobuf from binary. (This won't work because the version is too new)

Download binary version of Protocol Buffer v2.5.0 - Github release page

  1. Method 1

Haven't success yet

./configure
make
sudo make install
# Now it should be able to build hadoop without error
mvn package -Pdist,native -DskipTests -Dtar
  1. Method 2

Haven't success yet

./configure
make
cd hadoop-src-3.1.1/hadoop-common-project/hadoop-common
export HADOOP_PROTOC_PATH=/path/to/protobuf-2.5.0/src/protoc
export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt # without /jre
mvn compile -Pnative

When it's finished, copy lib folder to wherever you like and remember to add environment variable in hadoop-env.sh.

THE FINAL SOLUTION:

Before this I'm trying to use the minimum settings among blablabla.xml files in order to observe the functionality and only add it when I needed.

I think OS killed the map since it exceed the maxmum usage. So that's why I get the 'Cannot allocate memory' error.

And now I found that. If hadoop said it can't use "stack grard" to protect memory usage, maybe I can limit it by myself.

So I add some memory limitation configure in mapred-site.xml and yarn-site.xml. And it work perfect!!

Thus I'm not going to get rid of the warning now. :P

(Configure Memory Allocation)

The YARN Memory Allocation Properties

(Determine HDP Memory Configuration Settings)

Hostname Problem

I finally found that. Maybe the ApplicationMaster or somewhat can't not communicate with other nodes. (That I've read the logs and found that only the map on the one node can be successful others will fail.)

So the conlcusion is. You have to comment the self direction (i.e. 127.0.1.1 hostname) And set all the other nodes' hostname including itself to /etc/hosts

I haven't find out why I can't set it to be hostname.local because I really don't like hard-code things...

Multicast DNS (mDNS)

This is a zero-configuration protocol that works on LAN subnets. No server required. Uses the .local TL;DR.

Links

Configuration Files Documents

Similar Project