Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
  • Hadoop single node cluster setup in Ubuntu

    • 0
    • 1
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 739
    Comment on it

     

    Hi, this blog is to help you to set up a single node Hadoop environment on your Linux machine.

    To know about Hadoop follow these links
    https://en.wikipedia.org/wiki/Apache_Hadoop 
    http://www.tutorialspoint.com/hadoop/
    https://www.mapr.com/products/apache-hadoop

    http://findnerd.com/list/view/What-is-Hadoop/14171/

    You must have Java 6 (Java 7 or greater version recommended), ssh, rsync installed in order to install and use Hadoop.

    Here is a link below for detailed information about which Java version to be used with Hadoop.

    https://wiki.apache.org/hadoop/HadoopJavaVersions

    Confirm that, correct Java version is properly installed to your system. To confirm this execute the following command in terminal.

    java -version

    And you will get response like the below image if java is installed to your system and ensure the version of Java, else you need a fresh installation of java.

     

    For installing Java version of you choice please follow the below link.

    http://findnerd.com/account#url=/list/view/Install-Oracle-JDK-with-apt-get/2944/

     

    If ssh is not installed in your machine you need to install it.

    To install ssh

    sudo apt-get install ssh
    

    Install rsync using following command

    sudo apt-get install rsync

    To allow SSH public key authentication. 

    First, we have to generate an SSH key 

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

    The above command will create an RSA key pair with an empty password. As we don't want to enter passphrase every time Hadoop interacts with its nodes, so we are going to create it using an empty password.

     

    After this you have to enable SSH access to your local machine with this newly created key.

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

     

    Download your desired version of Hadoop binary tarball from Apache Hadoop website.

    http://hadoop.apache.org/releases.html

    Now go to the directory in which tarball is downloaded using terminal.

    In my case I downloaded hadoop-2.6.3.tar.gz in ~/Downloads/

    cd ~/Downloads/

    You need to extract the Hadoop package and put the extracted content to a location of your choice. In my case I choose /usr/local/hadoop.

    sudo tar -zxvf hadoop-2.6.3.tar.gz
    sudo mv hadoop-2.6.3 /usr/local/hadoop

     

    Copy the current Java path, for this you can opt to run the given command

    update-alternatives --config java

     

    To edit your bashrc run the following command

    sudo gedit ~/.bashrc

    Append the following lines at last in your .bashrc file.

    #Hadoop Variables
    export JAVA_HOME=/usr/lib/jvm/java-7-oracle 
    export HADOOP_HOME=/usr/local/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

     

    Your .bashrc should look like this

    Put the Java path without quotes where the above arrow indicates.
    Executes the content of the file .bashrc in terminal.

    source ~/.bashrc

     

    Now comes the Hadoop's Configuration part.


    Go to Hadoop's configuration directory

    cd /usr/local/hadoop/etc/hadoop

    Now update your hadoop-env.sh

     sudo gedit hadoop-env.sh

    You have to put the path of your current Java home which you had copied in earlier step inside double quotes.

    Update core-site.xml

    sudo gedit core-site.xml

    replace the <configuration></configuration> tag with given updated tags and save the file.

    <configuration>
    	<property>
    		<name>fs.defaultFS</name>
    		<value>hdfs://localhost:9000</value>
    	</property>
    </configuration>

    The content of core-site.xml should look like the following image.

     

    Update yarn-site.xml

    sudo gedit yarn-site.xml

    replace the <configuration></configuration> tag with given updated tags and save the file.

    <configuration>
    	<property>
    		<name>yarn.nodemanager.aux-services</name>
    		<value>mapreduce_shuffle</value>
    	</property>
    	<property>
    		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    		<value> org.apache.hadoop.mapred.ShuffleHandler</value>
    	</property>
    </configuration>
    

    Make a copy of mapred-site.xml.template with a name of mapred-site.xml 

    sudo cp mapred-site.xml.template mapred-site.xml

    Now edit your mapred-site.xml 

    sudo gedit mapred-site.xml

    replace the <configuration></configuration> tag with given updated tags and save the file.  

    <configuration>
    	<property>
    		<name>mapreduce.framework.name</name>
    		<value>yarn</value>
    	</property>
    </configuration>
    

    Edit hdfs-site.xml 

    sudo gedit hdfs-site.xml

    replace the <configuration></configuration> tag with given updated tags and save the file.  

    <configuration>
    	<property>
    		<name>dfs.replication</name>
    		<value>1</value>
    	</property>
    	<property>
    		<name>dfs.namenode.name.dir</name>
    		<value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
    	</property>
    	<property>
    		<name>dfs.datanode.data.dir</name>
    		<value>file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
    	</property>
    </configuration>

    Go back to your home directory

    cd

    You need to create a directory for your namenode and datanode. It is used to specify the directories which will be used as the namenode and the datanode on that host.

    sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
    sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

    Change ownership of the directory 

    sudo chown abhishek:abhishek -R /usr/local/hadoop

    follow this link to learn more about changing ownership http://www.techonthenet.com/linux/commands/chown.php 

    Hadoop file system needs to be formatted so that we can start to use it. The format command should be issued with write permission since it creates current directory under /usr/local/hadoop/hadoop_data/hdfs/namenode folder

    hdfs namenode -format

    Now we can start Hadoop services.to start Hadoop services run this command. 

     start-all.sh

    To list the process running after excuting the above command. Run jps in your terminal

    jps

    You must have SecondaryNameNode, NodeManager, ResourceManager, NameNode and DataNode should be running to ensure that the installation is fine and would work for our further tasks.

    Now go to the following urls to get the GUI of Hadoop

    http://localhost:8088/
    http://localhost:50070/
    http://localhost:50090/
    http://localhost:50075/


    Port 8088 is for All Application on your Hadoop system,


    Port 50070 is for NameNode Information


    Port 50075 is for DataNode Information


    Port 50090 is for Secondary NameNode Information


    To stop hadoop execute the command

    stop-all.sh

    We can change the password which was left blank in earlier step by following these links 
    https://www.sophos.com/en-us/support/knowledgebase/115708.aspx
    http://www.cyberciti.biz/faq/ssh-password-less-login-with-dsa-publickey-authentication/

    To run word count program on your single node cluster search for Word Count program on Hadoop on FindNerd.

 0 Comment(s)

Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: