基于 Ubuntu 部署 Apache Superset 数据看板
后知后觉 暂无评论

在旧版本 Ubuntu 18.04 上部署当前最新的 Apache Superset v3.0 版本

基础环境

需要注意从 Superset v2.0 开始,最低所需要的 Python 版本为 v3.9,而 Ubuntu 18.04 较老,官方仅提供了 v3.6, v3.7, v3.8 三个子版本,并且现在已经停止了主流支持,第三方的仓库也陆续放弃为其提供构建包,因此需要手动安装 Python 3.9 环境。

Python 3.9

获取官方源码包(建议选择 v3.9.16 及其之后的版本)

wget https://www.python.org/ftp/python/3.9.18/Python-3.9.18.tgz

安装编译依赖

sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev libbz2-dev

解压源码

tar xf Python-3.9.18.tgz

构建并共存安装至系统

cd Python-3.9.18/
./configure --enable-optimizations
sudo make altinstall

检查版本是否匹配

python3.9 --version

Scratch 方式安装 Superset

简易安装

首先为 Superset 创建安装目录(以 /opt/superset 为例)

cd /opt
sudo mkdir /opt/superset
sudo chown -R $USER:$USER /opt/superset

然后创建 venv 虚拟环境,以下内容截取自 Installing From Scratch

cd /opt/superset
python3.9 -m venv venv

安装后续所需的编译环境依赖

sudo apt install gcc g++
## 红帽系安装 gcc-c++
## 如果使用的 Ubuntu 20.04 或以后版本,还需要安装 python3.9-dev

进入虚拟环境,并安装主程序

source venv/bin/activate
## 先升级 PIP 包管理器
pip3 install --upgrade pip
## 因众所周知的大陆网络问题,使用清华源镜像进行后续的安装
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple apache-superset

数据准备

MySQL

为产品准备数据库,Superset 默认使用的数据库是内置的 SQLite,优点是小巧、依赖少,但是生产环境中不推荐使用,因此更换为 MySQL 5.7。从官网下载 MySQL 5.7 安装包并安装,此处不再赘述。

登录数据库并创建数据库和用户,并授权

CREATE DATABASE superset;
CREATE USER 'superset'@'%' IDENTIFIED BY 'Superset123!';
GRANT ALL PRIVILEGES ON superset.* TO 'superset'@'%';
FLUSH PRIVILEGES;
PostgreSQL

如果使用 PGSQL,参考官网安装后,创建用户并授权:

CREATE USER superset WITH CREATEDB CREATEROLE PASSWORD 'Superset123!';
CREATE DATABASE superset OWNER superset;

然后创建配置文件 /opt/superset/superset_config.py 下面的内容直接复制,先不用修改,如果产品安装目录变化,手动改为新的产品安装根目录即可

## superset_config.py
# Superset specific config
ROW_LIMIT = 5000

# Flask App Builder configuration
# Your App secret key will be used for securely signing the session cookie
# and encrypting sensitive information on the database
# Make sure you are changing this key for your deployment with a strong key.
# Alternatively you can set it with `SUPERSET_SECRET_KEY` environment variable.
# You MUST set this for production environments or the server will not refuse
# to start and you will see an error in the logs accordingly.
## 这里的值注意修改为新生成的加密串,可使用 openssl rand -base64 42 命令生成
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'

# The SQLAlchemy connection string to your database backend
# This connection defines the path to the database that stores your
# superset metadata (slices, connections, tables, dashboards, ...).
# Note that the connection information to connect to the datasources
# you want to explore are managed directly in the web UI
## 下面为默认的 SQLite 地址,直接注释掉
#SQLALCHEMY_DATABASE_URI = 'sqlite:////path/to/superset.db'
## 格式为 “协议://用户:密码@数据库地址/数据库名”,注意数据库密码中不要包含 @ 字符,如存在记得改掉
## 下面分别为 MySQL 和 PostgreSQL 配置,按需使用
SQLALCHEMY_DATABASE_URI = 'mysql://superset:Superset123!@localhost/superset'
#SQLALCHEMY_DATABASE_URI = 'postgresql://superset:Superset123!@localhost/superset'

# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True
# Add endpoints that need to be exempt from CSRF protection
WTF_CSRF_EXEMPT_LIST = []
# A CSRF token that expires in 1 year
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365

# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = ''

## 下面为缓存配置,先注释,存在兼容性问题尚未解决
#CACHE_CONFIG = {
#    "CACHE_TYPE": "RedisCache",
#    "CACHE_DEFAULT_TIMEOUT": 300,
#    "CACHE_KEY_PREFIX": "superset_",
#    'CACHE_REDIS_URL': 'redis://0.0.0.0:6379/0'
#}
#DATA_CACHE_CONFIG = CACHE_CONFIG

使用生成的新密钥替换密钥串

sed -i "s#YOUR_OWN_RANDOM_GENERATED_SECRET_KEY#$(openssl rand -base64 42)#" superset_config.py
小贴士:这里的替换标识符不可使用 + 或者 /,因为 base64 编码包含字母 a-Z 大小写,数字 0-9,符号 +/ 共计 64 个字符。

上述配置中以 ## 开头的行为额外加入的注释行,可在写入后统一删除掉

sed -i '/^##/d' superset_config.py

数据初始化

因私有的配置中使用了新的 MySQL 作为存储引擎,因此需要先安装相关库

pip3 install wheel
# MySQL
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple mysqlclient
# PostgreSQL (需要安装依赖 libpq-dev)
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple psycopg2

然后安装 PIL 库,否则会看到以下报错

pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple pillow
2023-10-09 13:53:47,669:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/opt/superset/venv/lib/python3.9/site-packages/flask_limiter/extension.py:336: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
  warnings.warn(
No PIL installation found

使用环境变量手动指定配置文件地址,否则初始化程序会使用默认的配置进行初始化,导致出错

export SUPERSET_CONFIG_PATH=/opt/superset/superset_config.py

然后初始化,如果上述操作中退出了虚拟环境,记得先进入虚拟环境中进行初始化

export FLASK_APP=superset
superset db upgrade

创建管理用户

superset fab create-admin
## 下面为执行输出
Loaded your LOCAL configuration at [/opt/superset/superset_config.py]
logging was configured successfully
2023-10-09 15:53:47,412:INFO:superset.utils.logging_configurator:logging was configured successfully
2023-10-09 15:53:47,417:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/opt/superset/venv/lib/python3.9/site-packages/flask_limiter/extension.py:336: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
  warnings.warn(
Username [admin]: 
User first name [admin]: 
User last name [user]: 
Email [admin@fab.org]: 
Password:                ## 输入密码
Repeat for confirmation: ## 重复密码
Recognized Database Authentications.
Admin User admin created.

创建默认角色和权限

superset init

载入示例数据(可选,数据量较大,需要等待较长时间)

superset load_examples
小贴士:其中样板数据存储在 GitHub 中,因为众所周知的原因在大陆可能下载失败,导致示例数据中部分图表显示错误,这是正常情况。

启动及守护

先进入开发者模式检查工作状态(不推荐在生产环境中使用)

superset run -h 0.0.0.0 -p 8088 --with-threads --reload --debugger --debug
可以看到下面的界面,右上角带有红色 debug 标识,如果 8088 端口被占用可修改参数换其他的可用端口。
开发者模式预览(!AVIF)
输入用户名和密码即可正常登录使用,可进入测试数据中查看
开发者模式调试(!AVIF)

在开发者模式下点击功能没有暴露异常后退出,在生产环境中不推荐使用 wSGI 直接为用户提供服务,线程效率差,稳定性低,并且日志不方便查看,因此使用 gunicorn 守护 wSGI 进程。

pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple gevent

启动生产环境服务

gunicorn -w 10 -k gevent --worker-connections 1000 --timeout 120 -b 0.0.0.0:6666 --forwarded-allow-ips 0.0.0.0 --limit-request-line 0 --limit-request-field_size 0 "superset.app:create_app()"

需要注意的是此方式映射出来的端口非标准 HTTP 协议,因此无法直接用端口访问,需要搭建一个前端代理,常见的可以使用 NGiNX,配置如下:

...

如果配置了证书,则需要参考 官网文档 的 HTTPS Configuration 部分。如果证书在负载层,负载配置证书可参考:

upstream superset_app {
    server localhost:6666;
    keepalive 100;
}

upstream superset_websocket {
    server localhost:6666;
    keepalive 100;
}

server {
    listen        80;
    server_name   superset.domain.com;
    rewrite ^(.*) https://$server_name$1 permanent;
}

server {
    listen        443 ssl http2;
    server_name   superset.domain.com;

    ssl_certificate     cert.d/domain.com/fullchain.cer;
    ssl_certificate_key cert.d/domain.com/domain.com.key;
    ssl_session_timeout 5m;
    ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;

    client_max_body_size 10m;
    output_buffers 20 10m;

    keepalive_timeout  30;
    keepalive_requests 2;

    location /ws {
        proxy_pass http://superset_websocket;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

    location / {
        proxy_pass http://superset_app;
        proxy_set_header    Host                $host;
        proxy_set_header    X-Real-IP           $remote_addr;
        proxy_set_header    X-Forwarded-For     $remote_addr;
        proxy_set_header    X-Forwarded-Host    $host;
        proxy_set_header    X-Forwarded-Proto   $scheme;
        proxy_http_version 1.1;
        port_in_redirect off;
        proxy_connect_timeout 300;
    }
}

配置完成后使用浏览器访问即可看到首页

生产环境预览(!AVIF)

附加内容

使用 systemd 守护进程

使用命令运行不是很方便,使用 screen 之类的工具又不方便检查日志,推荐使用 systemd 来守护进程,配置参考自 官网文档。先创建服务单元:

sudo vim /usr/lib/systemd/system/gunicorn.service
## 写入以下内容
[Unit]
Description=gunicorn daemon
After=network.target

[Service]
User=hadoop
Group=hadoop
WorkingDirectory=/opt/superset/
Environment="PATH=/opt/superset/venv/bin"
ExecStart=/opt/superset/venv/bin/gunicorn -w 10 -k gevent --worker-connections 1000 --timeout 120 -b 0.0.0.0:6666 --limit-request-line 0 --limit-request-field_size 0 "superset.app:create_app()"
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5
PrivateTmp=true

[Install]
WantedBy=multi-user.target
小贴士:注意修改 UserGroup 为服务运行用户,建议使用普通用户运行,而非根用户,根用户运行时日志会抛出大量无用的权限提示信息。

重载单元并启动

sudo systemctl daemon-reload
sudo systemctl start gunicorn.service

常见问题

a) The CSRF token is missing

启动后日志中存在大量类似的报错

flask_wtf.csrf.CSRFError: 400 Bad Request: The CSRF token is missing.
2023-10-10 15:54:18,115:WARNING:superset.views.base:Refresh CSRF token error

解决:

在配置文件中找到关键字 WTF_CSRF_ENABLED 并禁用,如果没有就加入:

WTF_CSRF_ENABLED = False

b) Class werkzeug.local.LocalProxy is not mapped

启动后日志中存在大量类似的报错

2023-10-10 14:31:46,342:WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped

解决:

待补充

c) Falling back to the built-in cache

启动后日志中存在大量类似的报错

Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `FILTER_STATE_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2023-10-10 14:31:46,070:WARNING:superset.utils.cache_manager:Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `FILTER_STATE_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `EXPLORE_FORM_DATA_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments
2023-10-10 14:31:46,072:WARNING:superset.utils.cache_manager:Falling back to the built-in cache, that stores data in the metadata database, for the following cache: `EXPLORE_FORM_DATA_CACHE_CONFIG`. It is recommended to use `RedisCache`, `MemcachedCache` or another dedicated caching backend for production deployments

解决:

这是因为没有配置外部缓存,导致缓存降级为内置缓存,在测试环境中可忽略,在生产环境中可使用以下配置(关键字可在默认配置中查询):

from datetime import timedelta
from superset.superset_typing import CacheConfig

CACHE_CONFIG: CacheConfig = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_KEY_PREFIX": "superset_cache_",
    "CACHE_REDIS_URL": "redis://127.0.0.1:6379/2"
}

DATA_CACHE_CONFIG: CacheConfig = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_KEY_PREFIX": "superset_data_",
    "CACHE_REDIS_URL": "redis://127.0.0.1:6379/2"
}

FILTER_STATE_CACHE_CONFIG: CacheConfig = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_KEY_PREFIX": "superset_filter_",
    "CACHE_REDIS_URL": "redis://127.0.0.1:6379/2"
}

EXPLORE_FORM_DATA_CACHE_CONFIG: CacheConfig = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_KEY_PREFIX": "superset_explore_",
    "CACHE_REDIS_URL": "redis://127.0.0.1:6379/2"
}

d) cannot import name 'url_quote' from 'werkzeug.urls'

启动时日志中存在类似的报错

Traceback (most recent call last):
  ...
    from werkzeug.urls import url_quote
ImportError: cannot import name 'url_quote' from 'werkzeug.urls' (/opt/superset/venv/lib/python3.9/site-packages/werkzeug/urls.py)

解决:

这个报错一般只会出现在 2.x 版本中,需要将部分组件降级。

pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple werkzeug==2.2.3

e) module 'sqlparse.keywords' has no attribute 'FLAGS'

启动时日志中存在类似的报错

Traceback (most recent call last):
  ...
    re.compile(r"'(''|\\\\|\\|[^'])*'", sqlparse.keywords.FLAGS).match,
AttributeError: module 'sqlparse.keywords' has no attribute 'FLAGS'

解决:

这个报错一般只会出现在 2.x 版本中,需要将部分组件降级。

pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple sqlparse==0.4.3

f) No module named 'marshmallow_enum'

启动时日志中存在类似的报错

Traceback (most recent call last):
  ...
    from marshmallow_enum import EnumField
ModuleNotFoundError: No module named 'marshmallow_enum'

解决:

缺少模块导致的

pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple marshmallow-enum==1.5.1

g) type "emaildeliverytype" already exists

在执行 superset db upgrade 时报错:

INFO  [alembic.runtime.migration] Running upgrade a61b40f9f57f -> 6c7537a6004a, models for email reports
Traceback (most recent call last):
  ...
    cursor.execute(statement, parameters)
psycopg2.errors.DuplicateObject: type "emaildeliverytype" already exists


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  ...
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateObject) type "emaildeliverytype" already exists

[SQL: CREATE TYPE emaildeliverytype AS ENUM ('attachment', 'inline')]
(Background on this error at: https://sqlalche.me/e/14/f405)

这是数据库中存在残留数据,如果是新创建的服务,可以直接删除数据库重新创建。


附录

参考链接

如果遇到问题或者对文章内容存疑,请在下方留言,博主看到后将及时回复,谢谢!
禁用 / 当前已拒绝评论,仅可查看「历史评论」。