Building and configuring Hadoop on Windows Select Start — All Programs — Microsoft Windows SDK v7.1 and open the Windows SDK 7 command prompt as the administrator. Change the directory to C: hadoop (if it doesn t exist, create it). I installed Hadoop on windows 7 but after installing Hadoop, Namenode is not running. I guess there is some. At java.net.URI.(URI.java:595).
Last modified: July 11, 2017
Ask all your coding questions @ codequery.io
Hadoop runs in three different modes. They are:
local/standalone mode
This is the default configuration for Hadoop out of the box. In standalone mode, Hadoop runs as a single process on your machine.
pseudo-distributed mode
In this mode, Hadoop runs each daemon as a separate Java process. This mimics a distributed implementation while running on a single machine.
fully distributed mode
This is a production level implementation that runs on a minimum of two or more machines.
For this tutorial, we will be implementing Hadoop in pseudo-distributed mode. This will allow you to practice a distributed implementation without the physical hardware needed to run a fully distributed cluster.
Configuring Hadoop
If you followed Hadoop environment setup then you should already have Java and Hadoop installed. To configure Hadoop for pseudo-distributed mode, you'll need to configure the following files located in /usr/local/hadoop/etc/hadoop:
Installing Hadoop On Windows 7
core-site.xml
This file defines port number, memory, memory limits, size of read/write buffers used by Hadoop. Find this file in the etc/hadoop directory and give it the following contents:
This sets the URI for all filesystem requests in Hadoop.
hdfs-site.xml
This is the main configuration file for HDFS. It defines the namenode and datanode paths as well as replication factor. Find this file in the etc/hadoop/ directory and replace it with the following:
Notice how we set the replication factor via the dfs.replication property. We define the namenode path dfs.name.dir to point to an hdfs directory under the hadoop user folder. We point the data node path dfs.data.dir to a similar destination.
It's important to remember that the paths we define for the namenode and datanode should be under the user we created for hadoop. This keeps hdfs isolated within the context of the hadoop user and also ensures the hadoop user will have read/write access to the file paths it needs to create.
yarn-site.xml
Yarn is a resource management platform for Hadoop. To configure Yarn, find the yarn-site.xml file in the /etc/hadoop/ directory and replace it with the following:
mapred-site.xml
This file defines the MapReduce framework for Hadoop. Hadoop provides a mapred-site.xml.template file out of the box, so first copy this into a new mapred-site.xml file via:
Now replace the contents of the mapred-site.xml with the following:
Configuring the Hadoop User Environment
Now that you've configured your Hadoop instance for pseudo-distributed mode, it's time to configure the hadoop user environment.
Log in as the hadoop user you created in Hadoop Environment Setup via:
As the Hadoop user, add the following to your ~/.bashrc profile:
This will add all of the required path variables to your profile so you can execute Hadoop commands and scripts. To register the changes to your profile, run:
Configuring Java for Hadoop
To use Java with Hadoop, you must add the java_home environment variable in hadoop-env.sh. Find the hadoop-env.sh file in the same /etc/hadoop/ directory and add the following:
This points Hadoop to your Java installation from Hadoop Environment Setup. You don't need to worry about running the source command, just update and save the file.
Verify Hadoop Configuration
You should be all set to start working with HDFS. To make sure everything is configured properly, navigate to the home directory for the hadoop user and run:
This will set up the namenode for HDFS. If everything is configured correctly, you should see something similar to this:
Verify Yarn
To start Yarn, run the following:
If yarn is configured properly, you should see something similar to the following output:
Verify HDFS
To ensure dfs is working properly, run the following command to start dfs:
Hadoop Download For Windows 10
If dfs starts successfully, you won't see any stack-trace errors and should see something similar to the output below:
Conclusion
Run Hadoop On Windows 10
Hadoop should now be properly configured for pseudo-distributed mode. You can verify things are working through the browser as well. Visit http://localhost:50070/ to see current running Hadoop services and http://localhost:8088/ to see a list of all applications running on the cluster.
Next we'll look at HDFS including basic architecture and commands.