(Deprecated) Deploy Airbyte on Kubernetes using Kustomize
This deployment method uses Kustomize and is only supported up to Airbyte version 0.40.32
. For existing deployments, check out commit 21a7e102183e20d2d4998ea70c2a8fe4eac8921b
to continue deploying using Kustomize. For new deployments, deploy Airbyte on Kubernetes via Helm.
This page guides you through deploying Airbyte Open Source on Kubernetes.
Requirements
To test locally, you can use one of the following:
- Docker Desktop with Kubernetes enabled
- Minikube with at least 4GB RAM
- Kind
To test on Google Kubernetes Engine(GKE), create a standard zonal cluster.
To test on Amazon Elastic Kubernetes Service (Amazon EKS), install eksctl and create a cluster.
Airbyte deployment is tested on GKE and EKS with version v1.19 and above. If you run into problems, reach out on the #airbyte-help
channel in our Slack or create an issue on GitHub.
Install and configure kubectl
Install kubectl
and run the following command to configure it and connect to your cluster:
kubectl use-context <my-cluster-name>
To configure kubectl
in GKE
:
- Initialize the
gcloud
cli. - To view cluster details, go to the
cluster
page in the Google Cloud Console and clickconnect
. Run the following command to test cluster details:gcloud container clusters get-credentials <CLUSTER_NAME> --zone <ZONE_NAME> --project <PROJECT_NAME>
. - To view contexts, run:
kubectl config get-contexts
. - To access the cluster from
kubectl
run :kubectl config use-context <gke context>
.
To configure kubectl
in EKS
:
- Configure AWS CLI to connect to your project.
- Install
eksctl
. - To Make contexts available to
kubectl
, runeksctl utils write-kubeconfig --cluster=<CLUSTER NAME>
- To view available contexts, run
kubectl config get-contexts
. - To access the cluster, run
kubectl config use-context <eks context>
.
Configure Logs
Default configuration
Airbyte comes with a self-contained Kubernetes deployment and uses a stand-alone Minio
deployment in both the dev
and stable
versions. Logs are published to the Minio
deployment by default.
To send the logs to the local Minio
deployment, make sure the specified credentials have both read and write permissions.
Custom configuration
Airbyte supports logging to the Minio
layer, S3
bucket, and GCS
bucket.
Customize the Minio
log location
To write to a custom location, update the following .env
variable in the kube/overlays/stable
directory (you will find this directory at the location you launched Airbyte)
S3_LOG_BUCKET=<your_minio_bucket_to_write_logs_in>
AWS_ACCESS_KEY_ID=<your_minio_access_key>
AWS_SECRET_ACCESS_KEY=<your_minio_secret_key>
S3_MINIO_ENDPOINT=<endpoint_where_minio_is_deployed_at>
S3_LOG_BUCKET_REGION=
Set theS3_PATH_STYLE_ACCESS variable to
true.
Let the
S3_LOG_BUCKET_REGION` variable remain empty.
Configure the Custom S3
Log Location
For the S3
log location, create an S3 bucket with your AWS credentials.
To write to a custom location, update the following .env
variable in the kube/overlays/stable
directory (you can find this directory at the location you launched Airbyte)
S3_LOG_BUCKET=<your_s3_bucket_to_write_logs_in>
S3_LOG_BUCKET_REGION=<your_s3_bucket_region>
# Set this to empty.
S3_MINIO_ENDPOINT=
# Set this to empty.
S3_PATH_STYLE_ACCESS=
Replace the following variable in .secrets
file in the kube/overlays/stable
directory:
AWS_ACCESS_KEY_ID=<your_aws_access_key_id>
AWS_SECRET_ACCESS_KEY=<your_aws_secret_access_key>
Configure the Custom GCS Log Location
Create a GCS bucket and GCP credentials if you haven’t already. Make sure your GCS log bucket has read/write permission.
To configure the custom log location:
Base encode the GCP JSON secret with the following command:
# The output of this command will be a Base64 string.
$ cat gcp.json | base64
To populate the gcs-log-creds
secrets with the Base64-encoded credential, take the encoded GCP JSON secret from the previous step and add it to secret-gcs-log-creds.yaml
file as the value for gcp.json
key.
apiVersion: v1
kind: Secret
metadata:
name: gcs-log-creds
namespace: default
data:
gcp.json: <base64-encoded-string>
In the kube/overlays/stable
directory, update the GCS_LOG_BUCKET
with your GCS log bucket credentials:
GCS_LOG_BUCKET=<your_GCS_bucket_to_write_logs_in>
Modify GOOGLE_APPLICATION_CREDENTIALS
to the path to gcp.json
in the .secrets
file at kube/overlays/stable
directory.
# The path the GCS creds are written to. Unless you know what you are doing, use the below default value.
GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcs-log-creds/gcp.json
Launch Airbyte
The following commands will help you launch Airbyte:
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
kubectl apply -k kube/overlays/stable
To check the pod status, run kubectl get pods | grep airbyte
.
If you are on Windows, run kubectl get pods
to the list of pods.
Run kubectl port-forward svc/airbyte-webapp-svc 8000:80
to allow access to the UI/API.
Navigate to http://localhost:8000 in your browser to verify the deployment.
Deploy Airbyte on Kubernetes in production
Set resource limits
Core container pods
To provide different resource requirements for core pods, set resource limits in the
kube/overlays/stable-with-resource-limits/set-resource-limits.yaml
file.To launch Airbyte with new resource limits, use the `kubectl apply -k kube/overlays/stable-with-resource-limits command.
Connector pods
- By default, connector pods launch without resource limits. To add resource limit, configure the
Docker resource limits
section of the.env
file in thekube/overlays
directory.
- By default, connector pods launch without resource limits. To add resource limit, configure the
Volume sizes
- To specify different volume sizes for the persistent volume backing Airbyte, modify
kube/resources/volume-*
files.
- To specify different volume sizes for the persistent volume backing Airbyte, modify
Increase job parallelism
The ability to run parallel jobs like getting specs, checking connections, discovering schemas and performing syncs is limited by a few factors. Airbyte-worker-pods
picks and executes the job. Increasing the number of workers will allow more jobs to be processed.
To create more worker pods, increase the number of replicas for the airbyte-worker
deployment. Refer to examples of increasing worker pods in a Kustomization patch in airbyte/kube/overlays/dev-integration-test/kustomization.yaml
and airbyte/kube/overlays/dev-integration-test/parallelize-worker.yaml
To limit the exposed ports in .env
file, set the value to TEMPORAL_WORKER_PORTS
. You can run jobs parallely at each exposed port.
If you do not have enough ports to communicate, the jobs might not complete or halt until ports become available.
You can set a limit for the maximum parallel jobs that run on the pod. Set the value to MAX_SPEC_WORKERS
, MAX_CHECK_WORKERS
, MAX_DISCOVER_WORKERS
, and MAX_SYNC_WORKERS
variables in the worker pod deployment and not in .env
file. You can use these values to create separate worker deployments for each type of worker with different resource allocations.
Cloud Logging
Airbyte writes logs to two different directories: The App-logging
directory and the job-logging
directory. App logs, server logs, and scheduler logs are written to the app-logging
directory. Job logs are written to the job-logging
directory. Both directories live at the top level. For example, the app logging directory may live at s3://log-bucket/app-logging
. We recommend having a dedicated logging bucket and not using it for other purposes.
Airbyte publishes logs every minute, so it’s normal to have minute-long log delays. Cloud Storages do not support append operations. Each publisher creates its own log files, which means you will have hundreds of files in your log bucket.
Each log file is uncompressed and named {yyyyMMddHH24mmss}_{podname}_{UUID}
.
To view logs, navigate to the relevant folder and download the file for the time period you want.
Use external databases
You can configure a custom database instead of a simple postgres
container in Kubernetes. This separate instance (AWS RDS or Google Cloud SQL) should be easier and safer to maintain than Postgres on your cluster.
Customize Airbytes Manifests
We use Kustomize to allow configuration for different environments. Our shared resources are in the kube/resources
directory. We recommend defining overlays for each environment and creating your own overlay to customize your deployments. The overlay can live in your own version control system.
An example of kustomization.yaml
file:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases: https://github.com/airbytehq/airbyte.git/kube/overlays/stable?ref=master
View Raw Manifests
To view manifests for a specific overlay that Kustomize applies to your Kubernetes cluster, run kubectl kustomize kube/overlays/stable
.
Helm Charts
For detailed information about Helm Charts, refer to the charts readme file.
Operator Guide
View API server logs
You can view real-time logs in kubectl logs deployments/airbyte-server
directory and download them from the Admin Tab.
Connector Container Logs
All logs can be accessed by viewing the scheduler logs. As for connector container logs, use Airbyte UI or Airbyte API to isolate them for a specific job attempt and for easier understanding. Connector pods launched by Airbyte will not relay logs directly to Kubernetes logging. You must access these logs through Airbyte.
Resize Volumes
To resize a volume, change the .spec.resources.requests.storage
value. After re-applying, extend the mount(if that operation is supported for your mount type). For a production deployment, track the usage of volumes to ensure they don't run out of space.
Copy Files in Volumes
To copy files, use the cp
command in kubectl.
List Files
To list files, run:
kubectl exec -it airbyte-server-6b5747df5c-bj4fx ls /tmp/workspace/8
Read Files
To read files, run:
kubectl exec -it airbyte-server-6b5747df5c-bj4fx cat /tmp/workspace/8/0/logs.log
Persistent storage on Google Kubernetes Engine(GKE) regional cluster
Running Airbyte on a GKE regional cluster requires enabling persistent regional storage. Start with enabling CSE driver on GKE and add storageClassName: standard-rwo
to the volume-configs.yamll.
Sample volume-configs.yaml
file:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: airbyte-volume-configs
labels:
airbyte: volume-configs
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
storageClassName: standard-rwo
Troubleshooting
If you encounter any issues, reach out to our community on Slack.