Install Hadoop on AWS Ubuntu Instance

October 15, 2015

15 October 2015

Step 1: Create an Ubuntu 14.04 LTS instance on AWS

Step 2: Connect to the instance

chmod 400 yourKey.pem

ssh-i yourKey.pem ubuntu@your\instance_ip_

Step 3: Install Java

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java6-installer

sudo update-java-alternatives -s java-6-oracle

sudo apt-get install oracle-java6-set-default

Step 4: Add a Hadoop user

sudo addgroup hadoop

sudo adduser — ingroup hadoop hduser

Step 5: Create SSH key for password-free login

su — hduser

ssh-keygen -t rsa -P “”

cat $HOME/.ssh/id\ >> $HOME/.ssh/authorized_keys_

Step 6: Try connection

ssh localhost


Step 7: Download and Install Hadoop

cd /usr/local

sudo wget

sudo tar –xzvf hadoop-1.2.1.tar.gz

sudo mv hadoop-1.2.1 hadoop

chown –R hduser:hadoop hadoop

sudo rm hadoop-1.2.1.tar.gz

Step 8: Update .bashrc

su — hduser

vim $HOME/.bashrc

Add the following content to the end of the file:

export HADOOP\PREFIX=/usr/local/hadoop_

export JAVA\HOME=/usr/lib/jvm/java-6-sun_

unalias fs &> /dev/null

alias fs=”hadoop fs”

unalias hls &> /dev/null

alias hls=”fs -ls”


Then save it with :wq and execute .bashrc

source ~/.bashrc

Step 9: Configure Hadoop with logged in as hduser

cd /usr/local/hadoop/conf


Add the following lines to the file:

export JAVAHOME=/usr/lib/jvm/java-6-oracle export HADOOPCLASSPATH=/usr/local/hadoop

Save and Exit :wq

Step 10: Create a temporary directory for Hadoop


sudo mkdir -p /app/hadoop/tmp

sudo chown hduser:hadoop /app/hadoop/tmp

sudo chmod 750 /app/hadoop/tmp

Step 11: Add snippets

su — hduser

cd /usr/local/hadoop/conf

vim core-site.xml

Put the following content in between < configuration > … configuration > tag

hadoop.tmp.dir /app/hadoop/tmp A base for other temporary directories. hdfs://localhost:54310 The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri’s scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri’s authority is used to determine the host, port, etc. for a filesystem.

Save and exit :wq

Also edit file: vim mapred-site.xml

mapred.job.tracker localhost:54311 The host and port that the MapReduce job tracker runs at. If “local”, then jobs are run in-process as a single map and reduce task.

Save and exit :wq

And edit this file: vim hdfs-site.xml

dfs.replication 1 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

Step 11: Format the HDFS

/usr/local/hadoop/bin/hadoop namenode -format

Step 12: Start Hadoop


Step 13: To check if all the processes are up and running


Step 14: To stop Hadoop by typing the following command:


Step 15: And start Hadoop again


Now ready to rock! Have fun:)

Originally published at on October 15, 2015.

Written by Victor Leung who is a keen traveller to see every country in the world, passionate about cutting edge technologies. Follow me on Twitter