Skip to main content

Starburst Galaxy destination user guide

Overview

The Starburst Galaxy destination syncs data to Starburst Galaxy great lake catalogs
in Apache Iceberg table format. Each stream is written to its own Iceberg table.

Features

FeatureSupportedNotes
Overwrite SyncWarning: this mode deletes all previously synced data in the destination table.
Append Sync
Deduped History
Namespaces
SSLSSL is enabled.

Data storage

Starburst Galaxy supports various object storages; however, only Amazon S3 is supported by this connector.

Configuration

CategoryParameterTypeNotes
Starburst GalaxyHostnamestringRequired. Located in the Connection info section of the view clusters pane in Starburst Galaxy.
PortstringOptional. Located in the Connection info section of the view clusters pane in Starburst Galaxy. Defaults to 443.
UserstringRequired. Galaxy user found in the Connection info section of the view clusters pane in Starburst Galaxy.
PasswordstringRequired. Password for the specified Galaxy user.
Amazon S3 catalogstringRequired. Name of the Amazon S3 catalog created in the Galaxy domain.
Amazon S3 catalog schemastringOptional. The default Starburst Galaxy Amazon S3 catalog schema where tables are written to if the source does not specify a namespace. Each data stream is written to a table in this schema. Defaults to public.
Staging Object Store - Amazon S3Bucket namestringRequired. Name of the bucket where the staging data is stored.
Bucket pathstringRequired. Sets the subdirectory of the specified S3 bucket used for storing staging data.
Bucket regionstringRequired. Sets the region of the specified S3 bucket.
Access keystringRequired. AWS/Minio credential.
Secret keystringRequired. AWS/Minio credential.
GeneralPurge staging Iceberg tablebooleanOptional. Indicates that staging Iceberg table is purged after a data sync is complete. Enabled by default. Disable it for debugging purposes only.

Staging files

S3

Data streams are written to a temporary Iceberg table, and then loaded into Amazon S3 Starburst Galaxy catalog in the Iceberg table format. Staging table is deleted after a sync is complete if the Purge staging Iceberg table is enabled. The following is an example of a full path for a staging file:

s3://<bucket-name>/<bucket-path>/<namespace/schema>/<temp Iceberg table name {_airbyte_tmp_random-three-chars_stream-name}>

For example:

s3://galaxy_bucket/data_output_path/test_schema/_airbyte_tmp_qey_user
↑ ↑ ↑ ↑
| | | temporary Iceberg table holding data
| | source namespace or provided schema name
| |
| bucket path
bucket name

Target Iceberg SQL table

Streams are synced in the Starburst Galaxy Amazon S3 catalog with Iceberg table format.

Output schema

Each table in the output schema has the following columns:

ColumnTypeDescription
_airbyte_ab_idvarcharUUID.
_airbyte_emitted_attimestamp(6)Data emission timestamp.
Data fields from the source streamvariousAll the fields from the source stream will be populated as an individual column in the target table.
_airbyte_additional_propertiesmap(varchar, varchar)Additional properties.

The Airbyte data stream's JSON schema is converted to an Avro schema. The JSON object is then converted to an Avro record; the Avro record is written to a staging Iceberg table. As the data stream can be generated from any data source, the JSON-to-Avro conversion process has arbitrary rules and limitations. Learn more about how source data is converted to Avro.

Datatype support

Learn more about Starburst Galaxy Iceberg type mapping.

Getting started

Requirements

Changelog

VersionDatePull RequestSubject
0.0.12023-03-28#24620Initial public release.