Hadoop is a free, Java-based programming framework that is used for the processing of large data sets in a distributed computing environment.
Apache Hadoop at the core consists of a storage part, known as Hadoop Distributed File System (HDFS), and MapReduce as the processing part. Hadoop divides files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.
Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was an Yahoo! employee.
Hadoop enables applications to run on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates fast data transfer rates among nodes and allows the system to continue running in case of a node failure. It lowers the risk of massive system failure, even if a significant number of nodes goes down.
Applications of Hadoop:
Analytics for marketting
Machine learning and data mining
XML Message processing
Hadoop framework is used by several companies including Google, Yahoo and IBM.