Hadoop Confiuration

Hadoop Configuration
 I have to do in the following layers.

  • HDFS Layer
    • NameNode-Master
    • DataNode-Store Data(Actual Storage)
  • MapReduce Layer
    • JobTracker
    • TaskTracker
  • Secondary Namenode– storing backup of NameNode it will not work as an alternate namenode, it just stored namenode metadata

Types of Hadoop Configurations

  • Standalone Mode
    • All processes runs as single process
    • Preferred in development
  • Pseudo Cluster Mode
    • All processes run in different process but on a single machine
    • Simulate cluster
  • Fully Cluster Mode
    • All processes running on different boxes
    • Preferred in production Mode

What are important files to be configure

  • hadoop-env.sh (set java environment and logging file)
  • core-site.xml (configure namenode)
  • hdfs-site.xml (configure datanode)
  • mapred-site.xml (map reduce here taking responsibility of configuring jobTracker and taskTracker)
  • yarn-site.xml
  • master (file configured on each datanodes telling about its namenode)
  • slave (file configured on namenode telling what all slave of datanode it has to manage)