Victor Leung
Victor Leung
BlogAI SolutionAlphaAlgoFlower shopFX CombineIEESushi ClassifierWealth Agile

Install Hadoop on AWS Ubuntu Instance

October 15, 2015

15 October 2015

Step 1: Create an Ubuntu 14.04 LTS instance on AWS

Step 2: Connect to the instance

_chmod 400 yourKey.pem_

_ssh-i yourKey.pem ubuntu@your_instance_ip_

Step 3: Install Java

_sudo add-apt-repository ppa:webupd8team/java_

_sudo apt-get update_

_sudo apt-get install oracle-java6-installer_

_sudo update-java-alternatives -s java-6-oracle_

_sudo apt-get install oracle-java6-set-default_

Step 4: Add a hadoop user

_sudo addgroup hadoop_

_sudo adduser — ingroup hadoop hduser_

Step 5: Create SSH key for password-free login

_su — hduser_

_ssh-keygen -t rsa -P “”_

_cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys_

Step 6: Try connection

_ssh localhost_

_exit_

Step 7: Download and Install Hadoop

_cd /usr/local_

_sudo wget _[_http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz_](http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz)

_sudo tar –xzvf hadoop-1.2.1.tar.gz_

_sudo mv hadoop-1.2.1 hadoop_

_chown –R hduser:hadoop hadoop_

_sudo rm hadoop-1.2.1.tar.gz_

Step 8: Update .bashrc

_su — hduser_

_vim $HOME/.bashrc_

Add the following content to the end of the file:

> _export HADOOP_PREFIX=/usr/local/hadoop_

> _export JAVA_HOME=/usr/lib/jvm/java-6-sun_

> _unalias fs &> /dev/null_

> _alias fs=”hadoop fs”_

> _unalias hls &> /dev/null_

> _alias hls=”fs -ls”_

> _export PATH=$PATH:$HADOOP_PREFIX/bin_

Then save it with :wq and execute .bashrc

_source ~/.bashrc_

Step 9: Configure Hadoop with logged in as hduser

_cd /usr/local/hadoop/conf_

_vim hadoop-env.sh_

Add the following lines to the file:

> *export JAVA*HOME=/usr/lib/jvm/java-6-oracle export HADOOP*CLASSPATH=/usr/local/hadoop*

Save and Exit :wq

Step 10: Create a temporary directory for Hadoop

_exit_

_sudo mkdir -p /app/hadoop/tmp_

_sudo chown hduser:hadoop /app/hadoop/tmp_

_sudo chmod 750 /app/hadoop/tmp_

Step 11: Add snippets

_su — hduser_

_cd /usr/local/hadoop/conf_

_vim core-site.xml_

Put the following content in between < configuration > … configuration > tag

    <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>

Save and exit :wq

Also edit file: vim mapred-site.xml

    <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>

Save and exit :wq

And edit this file: vim hdfs-site.xml

    <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>

Step 11: Format the HDFS

_/usr/local/hadoop/bin/hadoop namenode -format_

Step 12: Start Hadoop

_/usr/local/hadoop/bin/start-all.sh_

Step 13: To check if all the processes are up and running

_jps_

Step 14: To stop Hadoop by typing the following command:

_/usr/local/hadoop/bin/stop-all.sh_

Step 15: And start Hadoop again

_/usr/local/hadoop/bin/start-all.sh_

Now ready to rock! Have fun:)


About Victor Leung

Software development professional with expertise in application architecture, cloud solutions deployment, and financial products development. Possess a Master's degree in Computer Science and an MBA in Finance. Highly skilled in AWS (Certified Solutions Architect, Developer and SysOps Administrator), GCP (Professional Cloud Architect), Microsoft Azure, Kubernetes(CKA, CKAD, CKS, KCNA), and Scrum(PSM, PSPO) methodologies.

Happy to connect
LinkedIn
Github
Twitter
@victorleungtw

Continuous improvement

Copyright © victorleungtw.com 2023.