- Version here: 3.2.1 (
3.1.1)
We first download Hadoop from official website to temp_files/
fab download-hadoop
If
ERROR 404: Not Found
then go to mirror address and see what's the latest version
HADOOP_MIRROR = 'http://ftp.mirror.tw/pub/apache/hadoop/common' # Taiwan mirror
HADOOP_MIRROR = 'http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common' # China Tsinghua mirror
(if download different version, you should modify some settings in fabfile.py)
HADOOP_VERSION
(to see more message use -v
flag)
fab install-hadoop
- Add hadoop group
- Add hadoop user to hadoop group
- Add hadoop user to sudo group
- Generate ssh key and corresopnding authorized_keys in local (
./temp_files/hadoopSSH
) - Upload keys to remote by using hadoop user
- it will first remove files to make sure it's the newest version
- Upload hadoop-3.1.1.tar.gz to each node
- Extract it
- Move to /opt
- Change owner to hadoop group and user
# SerialGroupt.put() is still pending
# https://github.com/fabric/fabric/issues/1800
# https://github.com/fabric/fabric/issues/1810
# (but it is in the tutorial... http://docs.fabfile.org/en/2.4/getting-started.html#bringing-it-all-together)
- Append setting in /etc/bash.bashrc
- Source it to apply setting
- Also configure in $HADOOP_HOME/etc/hadoop/hadoop-env.sh
- JAVA_HOME
- HADOOP_HEAPSIZE_MAX
- Add master hostname address in $HADOOP_HOME/etc/hadoop/master
- Add slaves hostname address in $HADOOP_HOME/etc/hadoop/workers
Ps. in Hadoop 2.x it is $HADOOP_HOME/etc/hadoop/slaves
- Make directories and change their owner
- Format HDFS namenode
Update these files to the remotes: (will overwrite the original content)
- core-site.xml
- mapred-site.xml
- hdfs-site.xml
- yarn-site.xml
fab update-hadoop-conf
(use -c
flag to clean up tar and build file when it's finished)
fab fix-hadoop-lib
THIS METHOD IS STILL IN TEST PHASE
- fix IP (maybe)
- make another .md about how to configure hadooop configure file
- node number dynamic config (hostname or something else)
Preface: I've test my hadoop by official example. But I can't finish the calculate-PI example even it reach 100% map 100% reduce. But I can successful running wordcount. It's wierd. So I tried to solve the library problem see if these can solve it.
Java HotSpot(TM) Client VM warning: You have loaded library /opt/hadoop-3.1.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Java HotSpot(TM) Client VM warning: INFO: os::commit_memory(0x52600000, 104861696, 0) failed; error='Cannot allocate memory' (errno=12)
# I've tried to fix it by execstack but it doesn't work
sudo apt-get install execstack
sudo execstack -c /opt/hadoop-3.1.1/lib/native/libhadoop.so.1.0.0
# Some other said that they can fix it by adding these two environment variable in hadoop-env.sh but still doesn't work
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/"
export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native/"
- Compile Apache Hadoop on Linux (fix warning: Unable to load native-hadoop library) - Steps
- Hadoop “Unable to load native-hadoop library for your platform” warning - Reason explain
- By default the library in binary version hadoop is built for 32-bit.
- Check if the *.so file is readable
ldd libhadoop.so.1.0.0
- Building Native Hadoop Libraries to Fix VM Stack Guard error on 64 bit machine - Steps but a little outdated
# Get hadoop build tools
sudo apt-get install maven libssl-dev build-essential pkgconf cmake
# Get protobuf build tools
sudo apt-get install -y autoconf automake libtool curl make g++ unzip
I've tried to install libprotobuf10(latest version) and protobuf-compiler by apt-get but it get the error... (libprotobuf8 didn't found)
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.1.1:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.0.0', expected version is '2.5.0'
Build Protocol Buffers: It has to be v2.5.0
Follow Official C++ Installation of Protocol Buffers. Build protobuf from binary. (This won't work because the version is too new)
Download binary version of Protocol Buffer v2.5.0 - Github release page
- Method 1
Haven't success yet
./configure
make
sudo make install
# Now it should be able to build hadoop without error
mvn package -Pdist,native -DskipTests -Dtar
- Method 2
Haven't success yet
./configure
make
cd hadoop-src-3.1.1/hadoop-common-project/hadoop-common
export HADOOP_PROTOC_PATH=/path/to/protobuf-2.5.0/src/protoc
export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt # without /jre
mvn compile -Pnative
When it's finished, copy
lib
folder to wherever you like and remember to add environment variable inhadoop-env.sh
.
THE FINAL SOLUTION:
Before this I'm trying to use the minimum settings among blablabla.xml files in order to observe the functionality and only add it when I needed.
I think OS killed the map since it exceed the maxmum usage. So that's why I get the 'Cannot allocate memory' error.
And now I found that. If hadoop said it can't use "stack grard" to protect memory usage, maybe I can limit it by myself.
So I add some memory limitation configure in mapred-site.xml and yarn-site.xml. And it work perfect!!
Thus I'm not going to get rid of the warning now. :P
(Determine HDP Memory Configuration Settings)
I finally found that. Maybe the ApplicationMaster or somewhat can't not communicate with other nodes. (That I've read the logs and found that only the map on the one node can be successful others will fail.)
So the conlcusion is. You have to comment the self direction (i.e. 127.0.1.1 hostname
)
And set all the other nodes' hostname including itself to /etc/hosts
I haven't find out why I can't set it to be hostname.local because I really don't like hard-code things...
This is a zero-configuration protocol that works on LAN subnets. No server required. Uses the .local
TL;DR.
- gregbaker/raspberry-pi-cluster - use fabric 1.X
- How to build a 7 node Raspberry Pi Hadoop Cluster
- A Hadoop data lab project on Raspberry Pi
- Raspberry PI Hadoop Cluster
- Medium - How to Hadoop at home with Raspberry Pi
- Build a Hadoop 3 cluster with Raspberry Pi 3
- Building a Hadoop cluster with Raspberry Pi
- RPiCluster