發表於 程式分享

於ubuntu 20.04安裝hadoop 3.3.1

於ubuntu 20.04安裝hadoop 3.3.1,詳細步驟如下:

1.下載hadoop-3.3.1.tar.gz
https://hadoop.apache.org/
https://hadoop.apache.org/docs/stable/

2.安裝hadoop

tar xvf hadoop-3.3.1.tar.gz
sudo mv hadoop-3.3.1 /usr/local
sudo mv /usr/local/hadoop-3.3.1 hadoop

3.裝JDK 8 或 9

tar xvf jdk-8u291-linux-x64.tar.gz
sudo mv jdk1.8.0_291 /usr/local
sudo mv /usr/local/jdk1.8.0_291 /usr/local/jdk
-- 不建議用以下指令
sudo apt install openjdk-8-jdk-headless

4.調整設定檔
1) sudo vi /etc/profile
加上

export JAVA_HOME=/usr/local/jdk
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
alias hadoopdir="cd /usr/local/hadoop"

再執行以下指令

source /etc/profile

2) vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
加上

JAVA_HOME=/usr/local/jdk

3) vi /usr/local/hadoop/etc/hadoop/core-site.xml
加上

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-VirtualBox:9000</value>
<description>NameNode_URI</description>
</property>
</configuration>

4) vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml
加上

<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>ubuntu-VirtualBox:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-VirtualBox:50090</value>
</property> 
</configuration>

5) vi /usr/local/hadoop/etc/hadoop/yarn-site.xml
加上

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu-VirtualBox:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu-VirtualBox:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu-VirtualBox:8050</value>
</property>
</configuration>

5.Hadoop格式化
以root執行

hadoop namenode -format

若失敗,重新執行的方式

stop-all.sh
cd /usr/local/hadoop
rm -rf data/ logs/
hadoop namenode -format

6.啟動hadoop
start-all.sh

出現如下錯誤

root@ubuntu-VirtualBox:/home/ubuntu# start-all.sh
Starting namenodes on [ubuntu-VirtualBox]
ubuntu-VirtualBox: Warning: Permanently added 'ubuntu-virtualbox,10.0.2.15' (ECDSA) to the list of known hosts.
ubuntu-VirtualBox: root@ubuntu-virtualbox: Permission denied (publickey,password).
Starting datanodes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: root@localhost: Permission denied (publickey,password).
Starting secondary namenodes [ubuntu-VirtualBox]
ubuntu-VirtualBox: root@ubuntu-virtualbox: Permission denied (publickey,password).
Starting resourcemanager
Starting nodemanagers
localhost: root@localhost: Permission denied (publickey,password).

解法: 設定ssh免密碼登入

cd /root/.ssh
rm -rf *
ssh-keygen -t rsa
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

參考網址:
https://codertw.com/%E5%89%8D%E7%AB%AF%E9%96%8B%E7%99%BC/393790/
https://blog.csdn.net/qq_44166946/article/details/109808363

7. 關閉firewall
systemctl stop ufw
systemctl disable ufw

8.停止hadoop
stop-all.sh

9.查看hadoop process
jps
應該要出現如下幾個java process

29218 DataNode
29475 SecondaryNameNode
29687 ResourceManager
33707 Jps
29900 NodeManager
29020 NameNode

10.查看HDFS文件
hadoop fs -ls /

11.HDFS(分布式文件系統)指令請參考
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
URI: hdfs://namenode:namenodePort/parent/child 或是本機可直接使用/parent/child
(假設配置文件是namenode:namenodePort)

於ubuntu 20.04安裝hadoop 3.3.1 有 “ 1 則迴響 ”

發表留言