發表於 程式分享

微服務簡介

微服務是一種小型、可獨立部署、獨立擴展的軟件服務。目的是將特定功能封裝在更大的應用程序中、或是作為一應用程序進化的輔助支持。將應用程式分解為輕量級+解耦的服務,每個服務滿足特定業務需求,開發團隊更頻繁的部署應用及有效擴展。

以一種語言、框架無關的方式實現開發人員核心價值-編寫出更有差異化的軟體來提供更好的業務價值、度量指標收集問題。任何開發人員可基於微服務架構構建出以雲原生應用架構為基礎的任何應用程序,不必擔心網路影響到應用彈性、度量指標等問題

Java微服務框架
1.Hystrix: 熔斷
2.Ribbon: 客戶端Load Balance
3.Eureka: 服務註冊與發現
4.Zuul: 動態代理
缺點: 只能使用Java語言開發

Linux容器簡化應用程序打包/部署

容器為雲原生應用的基石: 應用容器化,開發/部署會更加敏捷、遷移更靈活、實現標準化。 容器編排可高效編排及利用資源,kubernetes編排容器服務已成為標準實踐,的在以API的方式進行擴展,並期望將任何高階的應用程序服務構建為插件

使用代理為將問題遷到基礎架構的一種方法,應用程序架構的7層服務代理為
1.重試
2.超時
3.熔斷
4.客戶端Load Balance
5.服務發現
6.安全性
7.指標收集

 

發表於 程式分享

三大GitOps Solution

三大GitOps Solution,分別說明如下

一、Flux (https://fluxcd.io)
1.依k8s design, auditable而設計
2.out of the box Integrations、Extensible,故可支援原生yaml檔以外的設定格式
3.請使用v2版本,v1已不支援使用
4.沒有GUI設定畫面,為CLI based (Command Line操作介面)
5.特點: container image有變動,也可通知k8s做變動,並寫一筆至repository並commit

二、Rancher Fleet (https://rancher.com)
1.依GitOps及大規模cluster而設計
2.Rancher UI,繼承了Rancher的好處,有SSO (LDAP)及授權(RBAC)的控管
3.Multi-Cluster: 以Fleet Controller cluster 內建可管理多套cluster

三、Argo (https://argoproj.github.io/argo-cd/)
1.知名度高
2.可管理多套cluster,但不是內建,故需取得kubernetes cluster的控制權
3.SSO: 支援OIDC,OAuth2,LDAP,SAML2.0,GitHub,GitLab,MS,LinkedIn
4.授權: 支援多組權限設定、RBAC政策
5.Web UI是三套架構內最好的
6.WebHook push整合佳,如GitHub、BitBucket、GitLab可主動呼叫ArgoCD有異動,
7.可自動、手動同步應用程式至其想達到的狀態
8.自動設定檔漂移偵測(drift detection)且可視覺化查看(Visualization)
9.有三大controller: API、Repository Service、Application Controller

發表於 程式分享

在ubuntu下安裝Spark

1.下載檔案
https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
tar xvf spark-3.1.2-bin-hadoop3.2.tgz
mv spark-3.1.2-bin-hadoop3.2 spark

4.新增檔案/usr/local/spark/djt.log

hadoop hadoop hadoop
spark spark spark

5.執行spark並進行字頻統計
/usr/local/spark/bin/spark-shell

val line = sc.textFile("/usr/local/spark/djt.log")
line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect().foreach(println)

6.standalone安裝
1) spark-env.sh

cd /usr/local/spark/conf
cp spark-env.sh.template spark-env.sh
mkdir /usr/local/spark/my-data

vi spark-env.sh

export JAVA_HOME=/usr/local/jdk
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export HADOOP_HOME=/usr/local/hadoop
SPARK_Master_WEBUI_PORT=8888
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=ubuntu-VirtualBox:2181 -Dspark.deploy.zookeeper.dir=/usr/local/spark/my-data"

2) slaves
cd /usr/local/spark/conf
vi slaves

ubuntu-VirtualBox

3) 啟動spark
因spark cluster需依賴zookeeper cluster,請先啟動zookeeper
/usr/local/zookeeper/bin/zkServer.sh start (需先啟動zookeeper)
/usr/local/spark/sbin/start-all.sh

jps
有以下process

Worker
Jps
QuorumPeerMain
Master

4) 查看瀏覽器
http://ubuntu-virtualbox:8888/

發表於 程式分享

在ubuntu下安裝Hive

需先安裝mysql
1.安裝mysql

apt install mysql-server
systemctl status mysql

2.改mysql root密碼 (預設無密碼)
1) 先查看mysql版本

mysql -V

2) mysql 8以前
mysql -u root -p

use mysql;

UPDATE user SET Password=PASSWORD("8888") WHERE User='root';
或
UPDATE user SET authentication_string=password('8888') WHERE User='root';

FLUSH PRIVILEGES;

另先開放權限讓root在任何主機都可執行(這個不安全)

CREATE USER 'root'@'%' IDENTIFIED BY '8888';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

3) mysql 8以後
mysql -u root -p

use mysql;
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '8888';
flush privileges;

3.建mysql / hive帳號
mysql -u root -p

CREATE user 'hive' IDENTIFIED BY '8888';
GRANT ALL PRIVILEGES ON *.* to 'hive'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

4.下載hive
https://ftp.tsukuba.wide.ad.jp/software/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

5.上傳至ubuntu的/usr/local/路徑下
6.解壓縮

cd /usr/local
tar xvf apache-hive-3.1.2-bin.tar.gz
mv apache-hive-3.1.2-bin hive

7.新增/usr/local/hive/conf/hive-site.xml路徑下設定檔
cd /usr/local/hive/conf/
cp hive-default.xml.template hive-site.xml

vi hive-site.xml

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
...
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://ubuntu-VirtualBox:3306/hive?createDatabaseIfNotExit=true</value>
<description>
JDBC connect string for a JDBC metastore.
</description>
</property>
...
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
...
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>8888</value>
<description>password to use against metastore database</description>
</property>

8.配置環境變數
vi /etc/profile

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin;$HIVE_HOME/bin

source /etc/profile

9.下載mysql driver複製mysql driver至/usr/local/hive/lib
https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.25.tar.gz

10.修改hive數據目錄
cd /usr/local/hive/conf/
vi hive-site.xml

<property>
<name>hive.querylog.location</name>
<value>/usr/local/hive/iotmp</value>
<description>Location of Hive run time structured log file</description>
</property>
...
<property>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/hive/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/hive/iotmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>

11.啟動hive
/usr/local/hive/bin/hive
註: 須先啟動hadoop及mysql才能啟動hive

發表於 程式分享

在ubuntu下裝Flume日誌採集系統

1.下載
http://www.apache.org/dyn/closer.lua/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
cd /usr/local
tar xvf apache-flume-1.9.0-bin.tar.gz
mv apache-flume-1.9.0-bin flume

4.修改/usr/local/flume/conf 路徑下設定檔
mv flume-conf.properties.template flume-conf.properties
vi flume-conf.properties

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink
agent.sources.seqGenSrc.type = seq
agent.sources.seqGenSrc.channels = memoryChannel
agent.sinks.loggerSink.type = logger
agent.sinks.loggerSink.channel = memoryChannel
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

5.啟動Flume
/usr/local/flume/bin/flume-ng agent -n agent -c conf -f /usr/local/flume/conf/flume-conf.properties -Dflume.root.logger=INFO,console

發表於 程式分享

在ubuntu下裝kafka

1.下載
https://downloads.apache.org/kafka/2.8.0/kafka_2.13-2.8.0.tgz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
cd /usr/local
tar xvf kafka_2.13-2.8.0.tgz
mv kafka_2.13-2.8.0 kafka

4.修改/usr/local/kafka/config/路徑下設定檔
1) zookeeper.properties

dataDir=/usr/local/hadoop/data/zookeeper/zkdata
clientPort=2181

2) consumer.properties

zookeeper.connect=ubuntu-VirtualBox:2181

3) producer.properties

metadata.broker.list=ubuntu-VirtualBox:9092

4) server.properties

broker.id=1
zookeeper.connect=ubuntu-VirtualBox:2181

5.啟動kafka
/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
註: 須先啟動zookeeper cluster才能啟動kafka cluster

發表於 程式分享

於ubuntu裝zookeeper (分布式協調服務)

1.下載zookeeper
https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.7.0/apache-zookeeper-3.7.0-bin.tar.gz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
cd /usr/local
tar xvf apache-zookeeper-3.7.0-bin.tar.gz
mv apache-zookeeper-3.7.0-bin zookeeper

4.建立路徑
/usr/local/hadoop/data/zookeeper/zkdata
/usr/local/hadoop/data/zookeeper/zkdatalog

新增檔案/usr/local/hadoop/data/zookeeper/zkdata/myid
內儲值1

5.新增/usr/local/zookeeper/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/hadoop/data/zookeeper/zkdata
dataLogDir=/usr/local/hadoop/data/zookeeper/zkdatalog
clientPort=2181
server.1=ubuntu-VirtualBox:2888:3888

6.啟動zookeeper
/usr/local/zookeeper/bin/zkServer.sh start
/usr/local/zookeeper/bin/zkServer.sh status

jps
會有process

QuorumPeerMain

7.查看訊息
/usr/local/zookeeper/bin/zkCli.sh -server ubuntu-VirtualBox:2181

68.208.3:2181, session id = 0x100002e0cfa0000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: ubuntu-VirtualBox:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: ubuntu-VirtualBox:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: ubuntu-VirtualBox:2181(CONNECTED) 1] create /test helloworld
Created /test
[zk: ubuntu-VirtualBox:2181(CONNECTED) 2] get /test
helloworld
[zk: ubuntu-VirtualBox:2181(CONNECTED) 3] get /test
helloworld
[zk: ubuntu-VirtualBox:2181(CONNECTED) 4] set /test zookeeper
[zk: ubuntu-VirtualBox:2181(CONNECTED) 5] get /test
zookeeper
[zk: ubuntu-VirtualBox:2181(CONNECTED) 6] delete /test
[zk: ubuntu-VirtualBox:2181(CONNECTED) 7]

7.啟動hadoop
start-all.sh
jps
有以下process

DataNode
NodeManager
SecondaryNameNode
ResourceManager
NameNode

8.啟動hbase
start-hbase.sh
jps
有以下process

HMaster
HRegionServer

9.查看HBase Web介面
http://ubuntu-virtualbox:16010/master-status

註: 啟動順序 zookeeper -> hadoop (hdfs) -> hbase

發表於 程式分享

於ubuntu安裝hBase

1.下載檔案
https://www.apache.org/dyn/closer.lua/hbase/2.3.6/hbase-2.3.6-bin.tar.gz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
cd /usr/local
tar xvf hbase-2.3.6-bin.tar.gz
mv hbase-2.3.6 hbase

4.配置環境變數
vi /etc/profile

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin

source /etc/profile

5.調整設定檔
cd /usr/local/hbase/conf
1) vi hbase-env.sh

export JAVA_HOME=/usr/local/jdk
export HBASE_MANAGES_ZK=true

2) vi hbase-site.xml

</configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu-VirtualBox:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ubuntu-VirtualBox</value>
</property>
</configuration>

6.啟動hbase
start-hbase.sh

發表於 程式分享

Hadoop HDFS Java API 串接使用程式開發

Hadoop HDFS Java API 串接使用程式開發,相關步驟如下,請參考

一、設定hadoop eclipse plugin

參考網址: https://www.programmersought.com/article/26674946880/
1.下載 hadoop-3.3.1.tar.gz
https://hadoop.apache.org/
https://hadoop.apache.org/docs/stable/

2.下載 apache-ant-1.10.11-bin.tar.gz
https://ant.apache.org/bindownload.cgi

3.下載eclipse-jee-indigo-SR2-win32-x86_64.zip
https://www.eclipse.org/downloads/packages/release/indigo/sr2

4.解壓縮及設定環境變數
1) 解壓縮以下檔案

hadoop-3.3.1.tar.gz 及 apache-ant-1.10.11-bin.tar.gz、eclipse-jee-indigo-SR2-win32-x86_64.zip

2) 設定環境變數

HADOOP_HOME=D:\04_Source\tool\hadoop-3.3.1
ANT_HOME=D:\04_Source\tool\apache-ant-1.10.11
PATH加上%ANT_HOME%\bin and %HADOOP_HOME%\bin

5.下載eclipse-hadoop3x專案及調整設定
1) 下載以下github內容eclipse-hadoop3x

https://github.com/Woooosz/eclipse-hadoop3x

2) 調整ivy/libraries.properties

hadoop.version=2.6.0
commons-lang.version=2.6
slf4j-api.version=1.7.25
slf4j-log4j12.version=1.7.25
guava.version=11.0.2
netty.version=3.10.5.Final

調整成同hadoop版本

hadoop.version=3.3.1
commons-lang.version=3.7
slf4j-api.version=1.7.30
slf4j-log4j12.version=1.7.30
guava.version=27.0-jre
netty.version=3.10.6.Final

3) 調整src\contrib\eclipse-plugin\build.xml
a.將

<target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">

調整成

<target name="compile" unless="skip.contrib">

b.於此行下

<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/htrace-core4-${htrace.version}.jar" todir="${build.dir}/lib" verbose="true"/>

增加

<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/woodstox-core-5.0.3.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/stax2-api-3.1.4.jar" todir="${build.dir}/lib" verbose="true"/>

c.將

<fileset dir="${hadoop.home}/libexec/share/hadoop/mapreduce">
<fileset dir="${hadoop.home}/libexec/share/hadoop/hdfs">
<fileset dir="${hadoop.home}/libexec/share/hadoop/common">
...
<fileset dir="${hadoop.home}/libexec/share/hadoop/mapreduce">
<fileset dir="${hadoop.home}/libexec/share/hadoop/common">
<fileset dir="${hadoop.home}/libexec/share/hadoop/hdfs">
<fileset dir="${hadoop.home}/libexec/share/hadoop/yarn">
...
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/protobuf-java-${protobuf.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/log4j-${log4j.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/commons-configuration2-${commons-configuration.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/commons-lang-${commons-lang.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/jackson-core-asl-${jackson.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/jackson-mapper-asl-${jackson.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/slf4j-log4j12-${slf4j-log4j12.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/slf4j-api-${slf4j-api.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/guava-${guava.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/hadoop-auth-${hadoop.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/netty-${netty.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/htrace-core4-${htrace.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/woodstox-core-5.0.3.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/libexec/share/hadoop/common/lib/stax2-api-3.1.4.jar" todir="${build.dir}/lib" verbose="true"/>
...
lib/woodstox-core-5.0.3.jar,
lib/stax2-api-3.1.4.jar,

調整成同hadoop上同樣的路徑

<fileset dir="${hadoop.home}/share/hadoop/mapreduce">
<fileset dir="${hadoop.home}/share/hadoop/hdfs">
<fileset dir="${hadoop.home}/share/hadoop/common">
...
<fileset dir="${hadoop.home}/share/hadoop/mapreduce">
<fileset dir="${hadoop.home}/share/hadoop/common">
<fileset dir="${hadoop.home}/share/hadoop/hdfs">
<fileset dir="${hadoop.home}/share/hadoop/yarn">
...
<copy file="${hadoop.home}/share/hadoop/common/lib/protobuf-java-${protobuf.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/log4j-${log4j.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-configuration2-${commons-configuration.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-lang3-${commons-lang.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/jackson-core-asl-${jackson.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/jackson-mapper-asl-${jackson.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-log4j12-${slf4j-log4j12.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-api-${slf4j-api.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/guava-${guava.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/hadoop-auth-${hadoop.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/netty-${netty.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core4-${htrace.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/woodstox-core-5.3.0.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/stax2-api-4.2.1.jar" todir="${build.dir}/lib" verbose="true"/>
...
lib/woodstox-core-5.3.0.jar,
lib/stax2-api-4.2.1.jar,

4) 調整src\contrib\eclipse-plugin\build.xml

<javac
encoding="${build.encoding}"
srcdir="${src.dir}"
includes="**/*.java"
destdir="${build.classes}"
debug="${javac.debug}"
deprecation="${javac.deprecation}">

調整成

<javac
encoding="${build.encoding}"
srcdir="${src.dir}"
includes="**/*.java"
destdir="${build.classes}"
debug="${javac.debug}"
deprecation="${javac.deprecation}"
includeantruntime="false"
>

6.建路徑

eclipse-hadoop3x\build\contrib\eclipse-plugin\classes

7.編譯eclipse-hadoop3x專案
切換至 eclipse-hadoop3x\src\contrib\eclipse-plugin 路徑下
執行以下指令

ant jar -Dversion=3.3.1 -Declipse.home=D:\Tool\eclipse\eclipse-indigo -Dhadoop.home=D:\04_Source\tool\hadoop-3.3.1

8.使用hadoop plugin
1) 將hadoop-eclipse-plugin-3.3.1.jar放到Eclipse dropins目錄
2) 重啟Eclipse
3) Eclipse
-> Window -> Open Perspective -> Other… -> Map/Reduce
-> New Hadoop location…. -> 理論上應該會開視窗設定hadoop連線,但失敗…此部份需再確認

二、開發HDFS Java API

1.請完成安裝Hadoop – 參考

於ubuntu 20.04安裝hadoop 3.3.1

2.確認前一篇安裝的Hadoop hostname與測試的電腦所設定的IP是一致的
設定/etc/hosts其對應到程式可連線的IP (用telnet ubuntu-VirtualBox 9000)測試

192.168.208.3 ubuntu-VirtualBox

3.建置Eclipse專案,程式

package com.tssco.hadoop;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.io.IOUtils;

public class HDFSAPITest {
 public static void main(String[] args) throws IOException, InterruptedException, URISyntaxException {
   Configuration conf = new Configuration();
   FileSystem fs = FileSystem.get(new URI("hdfs://ubuntu-VirtualBox:9000"), conf, "root");
   //建立路徑
   Path path = new Path("/data");
   System.out.println("1.路徑/data" + " 是否存在: " + fs.exists(path));
   boolean b = fs.exists(path);
   if (!b) {
      fs.mkdirs(path);
   }

   // 取得檔案清單
   RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(path, false);
   System.out.println("2.取得檔案清單");
   while (listFiles.hasNext()) {
      LocatedFileStatus next = listFiles.next();
      System.out.println(next.getPath());
      System.out.println(next.getReplication());
      BlockLocation[] blockLocations = next.getBlockLocations();
      for (BlockLocation bl : blockLocations) {
         System.out.println("\t子路徑: " + bl + ", size: " + bl.getLength());
      }
   }

   System.out.println("3.取得檔案狀態");
   FileStatus[] listStatus = fs.listStatus(new Path("/"));
   for (FileStatus fst : listStatus) {
      System.out.println("****** " + fst + " ******");
      System.out.println("\t\t是否為路徑: " + fst.isDirectory());
      System.out.println("\t\t是否為檔案: " + fst.isFile());
      System.out.println("\t\tsize: " + fst.getBlockSize());
   }

   // 上傳檔案至/data/package
   System.out.println("4.上傳檔案");
   FileInputStream in = new FileInputStream(new File("D:\\04_Source\\test2.txt"));
   FSDataOutputStream out = fs.create(new Path("/data/test2.txt"));
   IOUtils.copyBytes(in, out, 4096);

   // 上載檔案
   System.out.println("5.下載檔案");
   FSDataInputStream fsin = fs.open(new Path("/data/test2.txt"));
   FileOutputStream fsout = new FileOutputStream(new File("D:\\test2.txt"));
   IOUtils.copyBytes(fsin, fsout, 4096);
 }
}

4.將/usr/local/hadoop/share/hadoop目錄下的common, hdfs, mapreduce, yarn 4個子目錄的jar文件放到專案的jar目錄

5.執行結果

發表於 程式分享

於ubuntu 20.04安裝hadoop 3.3.1

於ubuntu 20.04安裝hadoop 3.3.1,詳細步驟如下:

1.下載hadoop-3.3.1.tar.gz
https://hadoop.apache.org/
https://hadoop.apache.org/docs/stable/

2.安裝hadoop

tar xvf hadoop-3.3.1.tar.gz
sudo mv hadoop-3.3.1 /usr/local
sudo mv /usr/local/hadoop-3.3.1 hadoop

3.裝JDK 8 或 9

tar xvf jdk-8u291-linux-x64.tar.gz
sudo mv jdk1.8.0_291 /usr/local
sudo mv /usr/local/jdk1.8.0_291 /usr/local/jdk
-- 不建議用以下指令
sudo apt install openjdk-8-jdk-headless

4.調整設定檔
1) sudo vi /etc/profile
加上

export JAVA_HOME=/usr/local/jdk
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
alias hadoopdir="cd /usr/local/hadoop"

再執行以下指令

source /etc/profile

2) vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
加上

JAVA_HOME=/usr/local/jdk

3) vi /usr/local/hadoop/etc/hadoop/core-site.xml
加上

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-VirtualBox:9000</value>
<description>NameNode_URI</description>
</property>
</configuration>

4) vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml
加上

<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>ubuntu-VirtualBox:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-VirtualBox:50090</value>
</property> 
</configuration>

5) vi /usr/local/hadoop/etc/hadoop/yarn-site.xml
加上

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu-VirtualBox:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu-VirtualBox:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu-VirtualBox:8050</value>
</property>
</configuration>

5.Hadoop格式化
以root執行

hadoop namenode -format

若失敗,重新執行的方式

stop-all.sh
cd /usr/local/hadoop
rm -rf data/ logs/
hadoop namenode -format

6.啟動hadoop
start-all.sh

出現如下錯誤

root@ubuntu-VirtualBox:/home/ubuntu# start-all.sh
Starting namenodes on [ubuntu-VirtualBox]
ubuntu-VirtualBox: Warning: Permanently added 'ubuntu-virtualbox,10.0.2.15' (ECDSA) to the list of known hosts.
ubuntu-VirtualBox: root@ubuntu-virtualbox: Permission denied (publickey,password).
Starting datanodes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: root@localhost: Permission denied (publickey,password).
Starting secondary namenodes [ubuntu-VirtualBox]
ubuntu-VirtualBox: root@ubuntu-virtualbox: Permission denied (publickey,password).
Starting resourcemanager
Starting nodemanagers
localhost: root@localhost: Permission denied (publickey,password).

解法: 設定ssh免密碼登入

cd /root/.ssh
rm -rf *
ssh-keygen -t rsa
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

參考網址:
https://codertw.com/%E5%89%8D%E7%AB%AF%E9%96%8B%E7%99%BC/393790/
https://blog.csdn.net/qq_44166946/article/details/109808363

7. 關閉firewall
systemctl stop ufw
systemctl disable ufw

8.停止hadoop
stop-all.sh

9.查看hadoop process
jps
應該要出現如下幾個java process

29218 DataNode
29475 SecondaryNameNode
29687 ResourceManager
33707 Jps
29900 NodeManager
29020 NameNode

10.查看HDFS文件
hadoop fs -ls /

11.HDFS(分布式文件系統)指令請參考
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
URI: hdfs://namenode:namenodePort/parent/child 或是本機可直接使用/parent/child
(假設配置文件是namenode:namenodePort)