Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
  • What is Hadoop?

    • 0
    • 1
    • 0
    • 0
    • 1
    • 0
    • 0
    • 0
    • 290
    Comment on it

    Hadoop is a free, Java-based programming framework that is used for the processing of large data sets in a distributed computing environment.

    Apache Hadoop at the core consists of a storage part, known as Hadoop Distributed File System (HDFS), and MapReduce as the processing part. Hadoop divides files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.

    Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was an Yahoo! employee.

    Hadoop enables applications to run on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates fast data transfer rates among nodes and allows the system to continue running in case of a node failure. It lowers the risk of massive system failure, even if a significant number of nodes goes down.

    Applications of Hadoop:

    • Analytics for marketting
    • Machine learning and data mining
    • Image processing
    • XML Message processing

    Hadoop framework is used by several companies including Google, Yahoo and IBM.

 1 Comment(s)

  • Hadoop is an open-source tool from the ASF – Apache Software Foundation. Open source project means it is freely available and we can even change its source code as per the requirements. If certain functionality does not fulfill your need then you can change it according to your need. Most of Hadoop code is written by Yahoo, IBM, Facebook, Cloudera.

    It provides an efficient framework for running jobs on multiple nodes of clusters. Cluster means a group of systems connected via LAN. Apache Hadoop provides parallel processing of data as it works on multiple machines simultaneously.

    By getting inspiration from Google, which has written a paper about the technologies. It is using technologies like Map-Reduce programming model as well as its file system (GFS). As Hadoop was originally written for the Nutch search engine project. When Doug Cutting and his team were working on it, very soon Hadoop became a top-level project due to its huge popularity.

    Apache Hadoop is an open source framework written in Java. The basic Hadoop programming language is Java, but this does not mean you can code only in Java. You can code in C, C++, Perl, Python, ruby etc. You can code the Hadoop framework in any language but it will be more good to code in java as you will have lower level control of the code.

    Big Data and Hadoop efficiently processes large volumes of data on a cluster of commodity hardware. Hadoop is for processing huge volume of data. Commodity hardware is the low-end hardware, they are cheap devices which are very economical. Hence, Hadoop is very economic.

    Hadoop can be setup on a single machine (pseudo-distributed mode, but it shows its real power with a cluster of machines. We can scale it to thousand nodes on the fly ie, without any downtime. Therefore, we need not make any system down to add more systems in the cluster. Follow this guide to learn Hadoop installation on a multi-node cluster.

    Hadoop consists of three key parts –

    • Hadoop Distributed File System (HDFS) – It is the storage layer of Hadoop.
    • Map-Reduce – It is the data processing layer of Hadoop.
    • YARN – It is the resource management layer of Hadoop.
    For more learning Hadoop
Sign In

Sign up using

Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: