於ubuntu 20.04安裝hadoop 3.3.1,詳細步驟如下:
1.下載hadoop-3.3.1.tar.gz
https://hadoop.apache.org/
https://hadoop.apache.org/docs/stable/
2.安裝hadoop
tar xvf hadoop-3.3.1.tar.gz sudo mv hadoop-3.3.1 /usr/local sudo mv /usr/local/hadoop-3.3.1 hadoop
3.裝JDK 8 或 9
tar xvf jdk-8u291-linux-x64.tar.gz sudo mv jdk1.8.0_291 /usr/local sudo mv /usr/local/jdk1.8.0_291 /usr/local/jdk -- 不建議用以下指令 sudo apt install openjdk-8-jdk-headless
4.調整設定檔
1) sudo vi /etc/profile
加上
export JAVA_HOME=/usr/local/jdk export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root alias hadoopdir="cd /usr/local/hadoop"
再執行以下指令
source /etc/profile
2) vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
加上
JAVA_HOME=/usr/local/jdk
3) vi /usr/local/hadoop/etc/hadoop/core-site.xml
加上
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ubuntu-VirtualBox:9000</value> <description>NameNode_URI</description> </property> </configuration>
4) vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml
加上
<configuration> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/hadoop/data/datanode</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/hadoop/data/namenode</value> </property> <property> <name>dfs.namenode.http-address</name> <value>ubuntu-VirtualBox:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>ubuntu-VirtualBox:50090</value> </property> </configuration>
5) vi /usr/local/hadoop/etc/hadoop/yarn-site.xml
加上
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>ubuntu-VirtualBox:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>ubuntu-VirtualBox:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>ubuntu-VirtualBox:8050</value> </property> </configuration>
5.Hadoop格式化
以root執行
hadoop namenode -format
若失敗,重新執行的方式
stop-all.sh cd /usr/local/hadoop rm -rf data/ logs/ hadoop namenode -format
6.啟動hadoop
start-all.sh
出現如下錯誤
root@ubuntu-VirtualBox:/home/ubuntu# start-all.sh Starting namenodes on [ubuntu-VirtualBox] ubuntu-VirtualBox: Warning: Permanently added 'ubuntu-virtualbox,10.0.2.15' (ECDSA) to the list of known hosts. ubuntu-VirtualBox: root@ubuntu-virtualbox: Permission denied (publickey,password). Starting datanodes localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts. localhost: root@localhost: Permission denied (publickey,password). Starting secondary namenodes [ubuntu-VirtualBox] ubuntu-VirtualBox: root@ubuntu-virtualbox: Permission denied (publickey,password). Starting resourcemanager Starting nodemanagers localhost: root@localhost: Permission denied (publickey,password).
解法: 設定ssh免密碼登入
cd /root/.ssh rm -rf * ssh-keygen -t rsa cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
參考網址:
https://codertw.com/%E5%89%8D%E7%AB%AF%E9%96%8B%E7%99%BC/393790/
https://blog.csdn.net/qq_44166946/article/details/109808363
7. 關閉firewall
systemctl stop ufw
systemctl disable ufw
8.停止hadoop
stop-all.sh
9.查看hadoop process
jps
應該要出現如下幾個java process
29218 DataNode 29475 SecondaryNameNode 29687 ResourceManager 33707 Jps 29900 NodeManager 29020 NameNode
10.查看HDFS文件
hadoop fs -ls /
11.HDFS(分布式文件系統)指令請參考
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
URI: hdfs://namenode:namenodePort/parent/child 或是本機可直接使用/parent/child
(假設配置文件是namenode:namenodePort)

於ubuntu 20.04安裝hadoop 3.3.1 有 “ 1 則迴響 ”