Install Hadoop on AWS Ubuntu Instance

Step 1: Create an Ubuntu 14.04 LTS instance on AWS.

Step 2: Connect to the instance.

chmod 400 yourKey.pem
ssh -i yourKey.pem ubuntu@your_instance_ip

Step 3: Install Java.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java6-installer
sudo update-java-alternatives -s java-6-oracle
sudo apt-get install oracle-java6-set-default

Step 4: Add a Hadoop user.

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser

Step 5: Create an SSH key for password-free login.

su - hduser
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 6: Test the connection.

ssh localhost
exit

Step 7: Download and Install Hadoop.

cd /usr/local
sudo wget [http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz](http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz)
sudo tar -xzvf hadoop-1.2.1.tar.gz
sudo mv hadoop-1.2.1 hadoop
chown -R hduser:hadoop hadoop
sudo rm hadoop-1.2.1.tar.gz

Step 8: Update .bashrc.

su - hduser
vim $HOME/.bashrc

# Add the following content to the end of the file:
export HADOOP_PREFIX=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-sun
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
export PATH=$PATH:$HADOOP_PREFIX/bin

Then save it with :wq and execute .bashrc.

source ~/.bashrc

Step 9: Configure Hadoop, logged in as hduser.

cd /usr/local/hadoop/conf
vim hadoop-env.sh

# Add the following lines to the file:
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
export HADOOP_CLASSPATH=/usr/local/hadoop

Save and exit with :wq.

Step 10: Create a temporary directory for Hadoop.

exit
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp

Step 11: Add configuration snippets.

su - hduser
cd /usr/local/hadoop/conf
vim core-site.xml

# Put the following content between <configuration> ... </configuration> tags:

Include your Hadoop configuration here.

# Save and exit with :wq

Continue with configuring your additional files as needed.

Step 12: Format the HDFS.

/usr/local/hadoop/bin/hadoop namenode -format

Step 13: Start Hadoop.

/usr/local/hadoop/bin/start-all.sh

Step 14: To check if all processes are up and running.

jps

Step 15: To stop Hadoop, type the following command:

/usr/local/hadoop/bin/stop-all.sh

Step 16: To start Hadoop again.

/usr/local/hadoop/bin/start-all.sh

You are now ready to rock! Have fun :)