Superset¶
Apache Superset is a modern, intuitive and highly scalable platform to explore and visualize data, from simple line charts to highly detailed geospatial charts.
Superset can connect to any SQL based data source through SQLAlchemy, including modern cloud native databases and engines at petabyte scale.
For more information, check here.
Log in to Superset¶
By default it is possible to access superset as Administrator with username and password: admin admin. A different password can be choosen through the app optional parameters. The username can be changed from the Superset interface.
Admins have all possible rights, including granting or revoking rights from other users and altering other people’s slices and dashboards. New users as well as customized role and permissions can be created through the superset interface.
Mandatory Parameters¶
As mandatory parameter one needs to specify the path to the Superset home directory where the superset_config.py
file is located. This directory is associated with the $SUPERSET_HOME
environment variable. By default, the application will save/access the necessary databases in this directory.
If the provided directory is empty, a superset_config.py
file will be created with default settings.
The user can create his/her own file to configure the application:
All the parameters defined in the config.py file can be altered in superset_config.py
.
Connect to Databases¶
The application start PostgreSQL. The PostgreSQL data directory is by default located in the $SUPERSET_HOME
folder. A pre-existing Postgres directory can be selected by specifying the path, through the corresponding optional parameter.
By default the Postgres user is superset, the default password is superset, and the name of the accessible database is also superset.
In order to create a new Postgres user and password and/or choose the name for a new Postgres database on which the Postgres user has been granted privileges, the user can use the app terminal interface, e.g.:
$ psql -c "create user $psuser with encrypted password '$pspass';"
$ psql -c "grant all privileges on database $postgres_db to $psuser;"
In addition to PostgreSQL it is possible to install the required packages for the different database choices (Presto, MySQL, Oracle, Elasticsearch, Snowflake, ...) as well as the packages needed to connect to the databases through the Additional Dependencies parameter.
An example of the local PostgresSQL database connection string is as follows:
postgresql+psycopg2://{username}:{password}@0.0.0.0:5432/{database}
It can be added either through the Superset GUI or through the configuration file superset_config.py
(see the Superset documentation for different databases).
The following video shows how to connect to the postgres database from the superset interface and how to enable the upload of a csv file.
Caching¶
By default the application starts a Redis server for caching service.
Superset uses Flask-Caching for caching purpose which supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the local filesystem. The necessary libraries can be installed through the Initialization parameter.
It is possible to customize the superset cache configuration specifying the related variables in the superset_config.py
file, as described here.
Gunicorn Settings¶
Several parameters can be tuned through the app optional parameters in order to optimize the Gunicorn config, including the worker class, the number of workers and threads, see here for details.
Async Queries via Celery¶
To enable support for long running queries that execute beyond the typical web request’s timeout (30-60 seconds), it is possible to configure Celery by modifying the superset_config.py
as described here.
In addition it is necessary to start the related services. The option -c , namely Start Celery worker, starts a Celery worker to leverage the configuration, through the command
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
The option -d, namely Start Celery beat, starts a job which schedules periodic background jobs, through the command:
celery --app=superset.tasks.celery_app:app beat
Contents