Install Hadoop on AWS Ubuntu Instance

15 October 2015

Step 1: Create an Ubuntu 14.04 LTS instance on AWS

Step 2: Connect to the instance

chmod 400 yourKey.pem

ssh-i yourKey.pem ubuntu@your_instance_ip

Step 3: Install Java

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java6-installer

sudo update-java-alternatives -s java-6-oracle

sudo apt-get install oracle-java6-set-default

Step 4: Add a hadoop user

sudo addgroup hadoop

sudo adduser — ingroup hadoop hduser

Step 5: Create SSH key for password-free login

su — hduser

ssh-keygen -t rsa -P “”

cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

Step 6: Try connection

ssh localhost


Step 7: Download and Install Hadoop

cd /usr/local

sudo wget

sudo tar –xzvf hadoop-1.2.1.tar.gz

sudo mv hadoop-1.2.1 hadoop

chown –R hduser:hadoop hadoop

sudo rm hadoop-1.2.1.tar.gz

Step 8: Update .bashrc

su — hduser

vim $HOME/.bashrc

Add the following content to the end of the file:

export HADOOP_PREFIX=/usr/local/hadoop

export JAVA_HOME=/usr/lib/jvm/java-6-sun

unalias fs &> /dev/null

alias fs=”hadoop fs”

unalias hls &> /dev/null

alias hls=”fs -ls”


Then save it with :wq and execute .bashrc

source ~/.bashrc

Step 9: Configure Hadoop with logged in as hduser

cd /usr/local/hadoop/conf


Add the following lines to the file:

export JAVAHOME=/usr/lib/jvm/java-6-oracle export HADOOPCLASSPATH=/usr/local/hadoop

Save and Exit :wq

Step 10: Create a temporary directory for Hadoop


sudo mkdir -p /app/hadoop/tmp

sudo chown hduser:hadoop /app/hadoop/tmp

sudo chmod 750 /app/hadoop/tmp

Step 11: Add snippets

su — hduser

cd /usr/local/hadoop/conf

vim core-site.xml

Put the following content in between < configuration > … configuration > tag

<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name></name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>

Save and exit :wq

Also edit file: vim mapred-site.xml

<property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>

Save and exit :wq

And edit this file: vim hdfs-site.xml

<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>

Step 11: Format the HDFS

/usr/local/hadoop/bin/hadoop namenode -format

Step 12: Start Hadoop


Step 13: To check if all the processes are up and running


Step 14: To stop Hadoop by typing the following command:


Step 15: And start Hadoop again


Now ready to rock! Have fun:)

Originally published at on October 15, 2015.

By Victor Leung

Experience in software development, consulting services and technical product management. Understanding of business and technology with an MBA in Finance and a Master degree in Computer Science. AWS Certified Solution Architect with experience in building products from scratch and serving as a charismatic leader.

Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.