Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
  • Word Count program on Hadoop

    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 270
    Comment on it

    Hi this blog is to help you to learn that how to run application on Hadoop system. For this we are going to take an example of wordcount program on a data. 
    So first of all we need our data on which wordcount process has to be run. You can also generate it for testing purpose in gedit or any editor of your choice. I choose to get text data from here http://www.lipsum.com/

    Put that file to a location of your choice let say put it in a directory hadoopTestData on Desktop.

    cd ~/Desktop

    create directory with name hadoopTestData

    mkdir hadoopTestData

    Then put your file in hadoopTestData directory. My file name is loremIpsum.

    Here is a snapshot below for my file loremIpsum, whose content will be input for my wordcount appliaction. 

    Now execute start-all.sh to run Hadoop services.

    start-all.sh

    Go to Hadoop's installation directory

    cd /usr/local/hadoop

    Now create a directory to put data into it

    bin/hdfs dfs -mkdir /avish

    Now to put data into your hdfs (hadoop distributed file system)

    bin/hdfs dfs -put ~/Desktop/hadoopTestData/LoremIpsum /avish/input

    you need to put your local source and destination respectively as

    bin/hdfs dfs -put <local src> ... <destination>

    Go to localhost:50070 in your browser.

    Click on Utilities and then click Browse the file system

    You will find your data put into hdfs with name input.

     

    Now you need to perform wordcount on your data.

    Hadoop package provides some jars that can be useful to us. We are going to use one of them. 

    Input this command in your terminal to perform word count operation on your data.

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount /avish/data/input /avish/data/output

    Now in All Application Dashboard you will see your application running with naming of wordcount.

    After completion of the process you can view the output using the command 

    bin/hdfs dfs -cat /avish/output/*

    And you will get your output as depicted in below image

    or you can also view your answer by downloading the output from the GUI utility to browse file system where you will get option to download the output file. Download it and view the answer in any text editor of your choice.

    In the output you will get the words present in your input file with their frequencies. 

 0 Comment(s)

Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: