發表於 程式分享

在ubuntu下安裝Spark

1.下載檔案
https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
tar xvf spark-3.1.2-bin-hadoop3.2.tgz
mv spark-3.1.2-bin-hadoop3.2 spark

4.新增檔案/usr/local/spark/djt.log

hadoop hadoop hadoop
spark spark spark

5.執行spark並進行字頻統計
/usr/local/spark/bin/spark-shell

val line = sc.textFile("/usr/local/spark/djt.log")
line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect().foreach(println)

6.standalone安裝
1) spark-env.sh

cd /usr/local/spark/conf
cp spark-env.sh.template spark-env.sh
mkdir /usr/local/spark/my-data

vi spark-env.sh

export JAVA_HOME=/usr/local/jdk
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export HADOOP_HOME=/usr/local/hadoop
SPARK_Master_WEBUI_PORT=8888
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=ubuntu-VirtualBox:2181 -Dspark.deploy.zookeeper.dir=/usr/local/spark/my-data"

2) slaves
cd /usr/local/spark/conf
vi slaves

ubuntu-VirtualBox

3) 啟動spark
因spark cluster需依賴zookeeper cluster,請先啟動zookeeper
/usr/local/zookeeper/bin/zkServer.sh start (需先啟動zookeeper)
/usr/local/spark/sbin/start-all.sh

jps
有以下process

Worker
Jps
QuorumPeerMain
Master

4) 查看瀏覽器
http://ubuntu-virtualbox:8888/

發表於 程式分享

在ubuntu下安裝Hive

需先安裝mysql
1.安裝mysql

apt install mysql-server
systemctl status mysql

2.改mysql root密碼 (預設無密碼)
1) 先查看mysql版本

mysql -V

2) mysql 8以前
mysql -u root -p

use mysql;

UPDATE user SET Password=PASSWORD("8888") WHERE User='root';
或
UPDATE user SET authentication_string=password('8888') WHERE User='root';

FLUSH PRIVILEGES;

另先開放權限讓root在任何主機都可執行(這個不安全)

CREATE USER 'root'@'%' IDENTIFIED BY '8888';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

3) mysql 8以後
mysql -u root -p

use mysql;
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '8888';
flush privileges;

3.建mysql / hive帳號
mysql -u root -p

CREATE user 'hive' IDENTIFIED BY '8888';
GRANT ALL PRIVILEGES ON *.* to 'hive'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

4.下載hive
https://ftp.tsukuba.wide.ad.jp/software/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

5.上傳至ubuntu的/usr/local/路徑下
6.解壓縮

cd /usr/local
tar xvf apache-hive-3.1.2-bin.tar.gz
mv apache-hive-3.1.2-bin hive

7.新增/usr/local/hive/conf/hive-site.xml路徑下設定檔
cd /usr/local/hive/conf/
cp hive-default.xml.template hive-site.xml

vi hive-site.xml

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
...
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://ubuntu-VirtualBox:3306/hive?createDatabaseIfNotExit=true</value>
<description>
JDBC connect string for a JDBC metastore.
</description>
</property>
...
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
...
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>8888</value>
<description>password to use against metastore database</description>
</property>

8.配置環境變數
vi /etc/profile

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin;$HIVE_HOME/bin

source /etc/profile

9.下載mysql driver複製mysql driver至/usr/local/hive/lib
https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.25.tar.gz

10.修改hive數據目錄
cd /usr/local/hive/conf/
vi hive-site.xml

<property>
<name>hive.querylog.location</name>
<value>/usr/local/hive/iotmp</value>
<description>Location of Hive run time structured log file</description>
</property>
...
<property>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/hive/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/hive/iotmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>

11.啟動hive
/usr/local/hive/bin/hive
註: 須先啟動hadoop及mysql才能啟動hive

發表於 程式分享

在ubuntu下裝Flume日誌採集系統

1.下載
http://www.apache.org/dyn/closer.lua/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
cd /usr/local
tar xvf apache-flume-1.9.0-bin.tar.gz
mv apache-flume-1.9.0-bin flume

4.修改/usr/local/flume/conf 路徑下設定檔
mv flume-conf.properties.template flume-conf.properties
vi flume-conf.properties

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink
agent.sources.seqGenSrc.type = seq
agent.sources.seqGenSrc.channels = memoryChannel
agent.sinks.loggerSink.type = logger
agent.sinks.loggerSink.channel = memoryChannel
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

5.啟動Flume
/usr/local/flume/bin/flume-ng agent -n agent -c conf -f /usr/local/flume/conf/flume-conf.properties -Dflume.root.logger=INFO,console

發表於 程式分享

在ubuntu下裝kafka

1.下載
https://downloads.apache.org/kafka/2.8.0/kafka_2.13-2.8.0.tgz

2.上傳至ubuntu的/usr/local/路徑下
3.解壓縮
cd /usr/local
tar xvf kafka_2.13-2.8.0.tgz
mv kafka_2.13-2.8.0 kafka

4.修改/usr/local/kafka/config/路徑下設定檔
1) zookeeper.properties

dataDir=/usr/local/hadoop/data/zookeeper/zkdata
clientPort=2181

2) consumer.properties

zookeeper.connect=ubuntu-VirtualBox:2181

3) producer.properties

metadata.broker.list=ubuntu-VirtualBox:9092

4) server.properties

broker.id=1
zookeeper.connect=ubuntu-VirtualBox:2181

5.啟動kafka
/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
註: 須先啟動zookeeper cluster才能啟動kafka cluster