15 October 2015
Step 1: Create an Ubuntu 14.04 LTS instance on AWS
Step 2: Connect to the instance
_chmod 400 yourKey.pem_
_ssh-i yourKey.pem ubuntu@your_instance_ip_
Step 3: Install Java
_sudo add-apt-repository ppa:webupd8team/java_
_sudo apt-get update_
_sudo apt-get install oracle-java6-installer_
_sudo update-java-alternatives -s java-6-oracle_
_sudo apt-get install oracle-java6-set-default_
Step 4: Add a hadoop user
_sudo addgroup hadoop_
_sudo adduser — ingroup hadoop hduser_
Step 5: Create SSH key for password-free login
_su — hduser_
_ssh-keygen -t rsa -P “”_
_cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys_
Step 6: Try connection
_ssh localhost_
_exit_
Step 7: Download and Install Hadoop
_cd /usr/local_
_sudo wget _[_http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz_](http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz)
_sudo tar –xzvf hadoop-1.2.1.tar.gz_
_sudo mv hadoop-1.2.1 hadoop_
_chown –R hduser:hadoop hadoop_
_sudo rm hadoop-1.2.1.tar.gz_
Step 8: Update .bashrc
_su — hduser_
_vim $HOME/.bashrc_
Add the following content to the end of the file:
> _export HADOOP_PREFIX=/usr/local/hadoop_
> _export JAVA_HOME=/usr/lib/jvm/java-6-sun_
> _unalias fs &> /dev/null_
> _alias fs=”hadoop fs”_
> _unalias hls &> /dev/null_
> _alias hls=”fs -ls”_
> _export PATH=$PATH:$HADOOP_PREFIX/bin_
Then save it with :wq and execute .bashrc
_source ~/.bashrc_
Step 9: Configure Hadoop with logged in as hduser
_cd /usr/local/hadoop/conf_
_vim hadoop-env.sh_
Add the following lines to the file:
> *export JAVA*HOME=/usr/lib/jvm/java-6-oracle export HADOOP*CLASSPATH=/usr/local/hadoop*
Save and Exit :wq
Step 10: Create a temporary directory for Hadoop
_exit_
_sudo mkdir -p /app/hadoop/tmp_
_sudo chown hduser:hadoop /app/hadoop/tmp_
_sudo chmod 750 /app/hadoop/tmp_
Step 11: Add snippets
_su — hduser_
_cd /usr/local/hadoop/conf_
_vim core-site.xml_
Put the following content in between < configuration > … configuration > tag
<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
Save and exit :wq
Also edit file: vim mapred-site.xml
<property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
Save and exit :wq
And edit this file: vim hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
Step 11: Format the HDFS
_/usr/local/hadoop/bin/hadoop namenode -format_
Step 12: Start Hadoop
_/usr/local/hadoop/bin/start-all.sh_
Step 13: To check if all the processes are up and running
_jps_
Step 14: To stop Hadoop by typing the following command:
_/usr/local/hadoop/bin/stop-all.sh_
Step 15: And start Hadoop again
_/usr/local/hadoop/bin/start-all.sh_
Now ready to rock! Have fun:)