Setup Hadoop

Version here: 3.2.1 (~~3.1.1~~)

Download Hadoop

We first download Hadoop from official website to temp_files/

Hadoop releases

fab download-hadoop

If ERROR 404: Not Found then go to mirror address and see what's the latest version

HADOOP_MIRROR = 'http://ftp.mirror.tw/pub/apache/hadoop/common' # Taiwan mirror
HADOOP_MIRROR = 'http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common' # China Tsinghua mirror

Configure

(if download different version, you should modify some settings in fabfile.py)

HADOOP_VERSION

Install Hadoop

(to see more message use -v flag)

fab install-hadoop

Setup hadoop user name and group

Add hadoop group
Add hadoop user to hadoop group
Add hadoop user to sudo group

Generate ssh key for hadoop user

Generate ssh key and corresopnding authorized_keys in local (./temp_files/hadoopSSH)
Upload keys to remote by using hadoop user
- it will first remove files to make sure it's the newest version

Upload, unpack and change owner

Upload hadoop-3.1.1.tar.gz to each node
Extract it
Move to /opt
Change owner to hadoop group and user

# SerialGroupt.put() is still pending
# https://github.com/fabric/fabric/issues/1800
# https://github.com/fabric/fabric/issues/1810
# (but it is in the tutorial... http://docs.fabfile.org/en/2.4/getting-started.html#bringing-it-all-together)

Setup environment variable

Append setting in /etc/bash.bashrc
Source it to apply setting
Also configure in $HADOOP_HOME/etc/hadoop/hadoop-env.sh
- JAVA_HOME
- HADOOP_HEAPSIZE_MAX

Setup slaves (worker)

Add master hostname address in $HADOOP_HOME/etc/hadoop/master
Add slaves hostname address in $HADOOP_HOME/etc/hadoop/workers

Ps. in Hadoop 2.x it is $HADOOP_HOME/etc/hadoop/slaves

Update hadoop configuration

Use update-hadoop-conf

Setup HDFS

Make directories and change their owner
Format HDFS namenode

Update hadoop configuration files

Update these files to the remotes: (will overwrite the original content)

core-site.xml
mapred-site.xml
hdfs-site.xml
yarn-site.xml

fab update-hadoop-conf

Fix native library error

(use -c flag to clean up tar and build file when it's finished)

fab fix-hadoop-lib

THIS METHOD IS STILL IN TEST PHASE

TODO

fix IP (maybe)
make another .md about how to configure hadooop configure file
- node number dynamic config (hostname or something else)

Trouble Shooting

Fix library (Cannot allocate memory)

Preface: I've test my hadoop by official example. But I can't finish the calculate-PI example even it reach 100% map 100% reduce. But I can successful running wordcount. It's wierd. So I tried to solve the library problem see if these can solve it.

Java HotSpot(TM) Client VM warning: You have loaded library /opt/hadoop-3.1.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Java HotSpot(TM) Client VM warning: INFO: os::commit_memory(0x52600000, 104861696, 0) failed; error='Cannot allocate memory' (errno=12)

# I've tried to fix it by execstack but it doesn't work
sudo apt-get install execstack
sudo execstack -c /opt/hadoop-3.1.1/lib/native/libhadoop.so.1.0.0

# Some other said that they can fix it by adding these two environment variable in hadoop-env.sh but still doesn't work
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/"
export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native/"

Compile Apache Hadoop on Linux (fix warning: Unable to load native-hadoop library) - Steps
Hadoop “Unable to load native-hadoop library for your platform” warning - Reason explain
- By default the library in binary version hadoop is built for 32-bit.
- Check if the *.so file is readable ldd libhadoop.so.1.0.0
Building Native Hadoop Libraries to Fix VM Stack Guard error on 64 bit machine - Steps but a little outdated

# Get hadoop build tools
sudo apt-get install maven libssl-dev build-essential pkgconf cmake
# Get protobuf build tools
sudo apt-get install -y autoconf automake libtool curl make g++ unzip

I've tried to install libprotobuf10(latest version) and protobuf-compiler by apt-get but it get the error... (libprotobuf8 didn't found)

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.1.1:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.0.0', expected version is '2.5.0'

Build Protocol Buffers: It has to be v2.5.0

Follow Official C++ Installation of Protocol Buffers. Build protobuf from binary. (This won't work because the version is too new)

Download binary version of Protocol Buffer v2.5.0 - Github release page

Method 1

Haven't success yet

./configure
make
sudo make install
# Now it should be able to build hadoop without error
mvn package -Pdist,native -DskipTests -Dtar

Method 2

Haven't success yet

./configure
make
cd hadoop-src-3.1.1/hadoop-common-project/hadoop-common
export HADOOP_PROTOC_PATH=/path/to/protobuf-2.5.0/src/protoc
export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt # without /jre
mvn compile -Pnative

When it's finished, copy lib folder to wherever you like and remember to add environment variable in hadoop-env.sh.

THE FINAL SOLUTION:

Before this I'm trying to use the minimum settings among blablabla.xml files in order to observe the functionality and only add it when I needed.

I think OS killed the map since it exceed the maxmum usage. So that's why I get the 'Cannot allocate memory' error.

And now I found that. If hadoop said it can't use "stack grard" to protect memory usage, maybe I can limit it by myself.

So I add some memory limitation configure in mapred-site.xml and yarn-site.xml. And it work perfect!!

Thus I'm not going to get rid of the warning now. :P

(Configure Memory Allocation)

(Determine HDP Memory Configuration Settings)

Hostname Problem

I finally found that. Maybe the ApplicationMaster or somewhat can't not communicate with other nodes. (That I've read the logs and found that only the map on the one node can be successful others will fail.)

So the conlcusion is. You have to comment the self direction (i.e. 127.0.1.1 hostname) And set all the other nodes' hostname including itself to /etc/hosts

I haven't find out why I can't set it to be hostname.local because I really don't like hard-code things...

Hadoop Wiki - Connection Refused
Network setup - The hostname resolution

Multicast DNS (mDNS)

Wiki - .local

This is a zero-configuration protocol that works on LAN subnets. No server required. Uses the .local TL;DR.

Links

Configuration Files Documents

for core-site.xml
for yarn-site.xml
for hdfs-site.xml
for mapred-site.xml

Similar Project

gregbaker/raspberry-pi-cluster - use fabric 1.X
How to build a 7 node Raspberry Pi Hadoop Cluster
A Hadoop data lab project on Raspberry Pi
Raspberry PI Hadoop Cluster
Medium - How to Hadoop at home with Raspberry Pi
Build a Hadoop 3 cluster with Raspberry Pi 3
Building a Hadoop cluster with Raspberry Pi
RPiCluster
- Demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SetupHadoop.md

SetupHadoop.md

Setup Hadoop

Download Hadoop

Configure

Install Hadoop

Setup hadoop user name and group

Generate ssh key for hadoop user

Upload, unpack and change owner

Setup environment variable

Setup slaves (worker)

Update hadoop configuration

Setup HDFS

Update hadoop configuration files

Fix native library error

TODO

Trouble Shooting

Fix library (Cannot allocate memory)

Hostname Problem

Multicast DNS (mDNS)

Links

Configuration Files Documents

Similar Project

Files

SetupHadoop.md

Latest commit

History

SetupHadoop.md

File metadata and controls

Setup Hadoop

Download Hadoop

Configure

Install Hadoop

Setup hadoop user name and group

Generate ssh key for hadoop user

Upload, unpack and change owner

Setup environment variable

Setup slaves (worker)

Update hadoop configuration

Setup HDFS

Update hadoop configuration files

Fix native library error

TODO

Trouble Shooting

Fix library (Cannot allocate memory)

Hostname Problem

Multicast DNS (mDNS)

Links

Configuration Files Documents

Similar Project