Apache Hive 2.3.9 集群部署
后知后觉 暂无评论

Apache Hive 是一个可实现大规模分析的分布式容错数据仓库系统。本文记录一下旧版本的安装和部署流程。

Hive

配置后端数据库在所有版本上操作一致。

配置后端存储

Hive 为数据创建的索引信息和一些其他数据需要存储在后端数据库中,数据库允许使用多种类型的关系型数据库,既可外置,也可使用内置,因 MySQL 数据库的 SQL 较为简单,并且受众较多,因此使用 MySQL 为例。

安装 MySQL 5.7 服务端(其他版本也可以,本文以 MySQL 5.7 为例,可以在官网下载安装包。)

sudo dnf install mysql-server

启动数据库

sudo systemctl start mysqld
sudo systemctl enable mysqld

登录数据库(默认如果没有密码,按两次回车即可。)

mysql -uroot -p

小贴士:后续的版本密码在启动服务时会随机生成,可以使用命令获取。

sudo grep -i 'password' /var/log/mysqld.log

执行 SQL 命令创建 Hive 存储所使用的数据库和专用用户,不建议配置及使用 root 用户。

> CREATE USER 'hive'@'%' IDENTIFIED BY 'Uos@2022';
> GRANT ALL ON *.* TO 'hive'@'%' IDENTIFIED BY 'Uos@2022';
> FLUSH PRIVILEGES;

常见问题

a) 如果错误提示如下

ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.

这是因为官方源安装的 MySQL 需要先重置默认密码,然后才能执行命令

SET PASSWORD = PASSWORD('Uosuos123!');

b) 如果错误提示如下

ERROR 1819 (HY000): Your password does not satisfy the current policy requirements

这是因为官方默认的密码策略较为严格,需要包含大小写字母和半角符号,请修改密码为高强度密码。

下载

下载 Hive 2.3.9 Stable 版本

wget https://dlcdn.apache.org/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz

解压

sudo tar xf apache-hive-2.3.9-bin.tar.gz -C /opt/

授权

sudo chown -R $USER:$USER /opt/apache-hive-2.3.9-bin/

下载 MySQL 的 Java-Connector 连接驱动

cd $HIVE_HOME/lib
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.18/mysql-connector-java-8.0.18.jar

配置环境变量

sudo vim /etc/profile.d/hadoop.sh
## 加入以下内容
export HIVE_HOME=/opt/apache-hive-2.3.9-bin
export PATH=$HIVE_HOME/bin:$PATH

生效环境变量或者重新登录终端

## 重载环境变量
source /etc/profile

配置 Hive

创建 Hive 配置文件

cd $HIVE_HOME
vim conf/hive-site.xml

写入以下内容

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false&amp;serverTimezone=GMT</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>Uos@2022</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.server2.webui.host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>hive.server2.webui.port</name>
    <value>10002</value>
  </property>
  <property>
    <name>hive.server2.webui.max.threads</name>
    <value>50</value>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.metastore.schema.verification.record.version</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>hive.metastore.port</name>
    <value>9083</value>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop1:9083</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
</configuration>

初始化

创建 HDFS 的 Hive 临时目录和存储目录,并授权

hdfs dfs -mkdir /tmp
hdfs dfs -chmod g+w /tmp
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /user/hive/warehouse

检查 Hive 的 HDFS 路径是否创建成功

$ hdfs dfs -ls /
Found 2 items
drwxrwxr-x   - uos supergroup          0 2022-12-02 12:48 /tmp
drwxr-xr-x   - uos supergroup          0 2022-12-02 12:48 /user

初始化 Hive 存储

schematool -initSchema -dbType mysql

然后会看到大量的提示信息,稍等片刻后显示下面的关键字即为初始化数据成功。

Initialization script completed
schemaTool completed

启动服务

启动服务(调试状态下可以多创建两个窗口执行)

hive --service metastore
hive --service hiveserver2
测试无误后,可以通过命令后台执行
nohup hive --service metastore > /dev/null 2>&1 &
nohup hive --service hiveserver2 > /dev/null 2>&1 &

测试服务

测试 hive-cli 连接是否正常(双井号后是实际执行的 hive 命令,注意甄别)

## 进入 cli
$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.9-bin/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (default)> [kane@hadoop1 apache-hive-2.3.9-bin]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.9-bin/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
## 查询表
hive (default)> show databases;
OK
database_name
default
Time taken: 0.473 seconds, Fetched: 1 row(s)

测试 beeline 连接是否正常

$ beeline shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.9 by Apache Hive
## 连接数据库(替换地址为集群实际 hadoop1 的地址)
beeline> !connect jdbc:hive2://172.16.16.231:10000
Connecting to jdbc:hive2://172.16.16.231:10000
Enter username for jdbc:hive2://172.16.16.231:10000: hive
Enter password for jdbc:hive2://172.16.16.231:10000: **********
Connected to: Apache Hive (version 2.3.9)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
## 查询表
0: jdbc:hive2://172.16.16.231:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (0.6 seconds)

使用浏览器访问 http://hadoop1:10002 可以看到 HiveServer2 管理页面


附录

参考链接

本文撰写于一年前,如出现图片失效或有任何问题,请在下方留言。博主看到后将及时修正,谢谢!
禁用 / 当前已拒绝评论,仅可查看「历史评论」。