Pipelines with AWS DMS

AWS DMS to Kinesis to CrateDB.

What’s Inside

AWS Infrastructure Setup

The following walkthrough describes a full deployment of AWS DMS including relevant outbound data processors for demonstration purposes.

In order to run it in production, you are welcome to derive from it and tweak it for your own purposes. YMMV. If you need support, don’t hesitate to ask for help.

Install

Install LorryStream.

pip install --upgrade 'lorrystream[carabas]'

Acquire IaC driver program.

wget https://github.com/daq-tools/lorrystream/raw/main/examples/aws/rds_postgresql_kinesis_lambda_oci_cratedb.py

Configure

Please configure endpoint and replication settings within the source code of the IaC program you just acquired, and presented next.

Deploy

First, prepare an AWS ECR repository for publishing the OCI image including your downstream processor element that is consuming the replication data stream from Amazon Kinesis, and runs it into CrateDB. To learn about how this works, please visit the documentation section about the ECR Repository.

Configure CrateDB database sink address.

export SINK_SQLALCHEMY_URL='crate://admin:dZ..qB@example.eks1.eu-west-1.aws.cratedb.net:4200/?ssl=true'

Invoke the IaC driver program in order to deploy relevant resources on AWS.

python examples/aws/rds_postgresql_kinesis_lambda_oci_cratedb.py

After deployment succeeded, you will be presented a corresponding response including relevant information about entrypoints to the software stack you’ve just created.

Result of CloudFormation deployment:
psql command: psql "postgresql://dynapipe:secret11@testdrive-dms-postgresql-dev-db.czylftvqn1ed.eu-central-1.rds.amazonaws.com:5432/postgres"
RDS Instance ARN: arn:aws:rds:eu-central-1:831394476016:db:testdrive-dms-postgresql-dev-db
Stream ARN: arn:aws:kinesis:eu-central-1:831394476016:stream/testdrive-dms-postgresql-dev-stream
Replication ARN: arn:aws:dms:eu-central-1:831394476016:replication-config:EAM3JEHXGBGZBPN5PLON7NPDEE

Note

Please note this is a demonstration stack, deviating from typical real-world situations.

  • Contrary to this stack, which includes an RDS PostgreSQL instance, a database instance will already be up and running, so the remaining task is to just configure the Kinesis Data Stream and consume it.

  • Contrary to this stack, which uses AWS Lambda to host the downstream processor element, when aiming for better cost-effectiveness, you will run corresponding code on a dedicated computing environment.

Operations

Please consult the AWS DMS Handbook to learn about commands suitable for operating the AWS DMS engine.

Usage

DMS

AWS DMS provides full-load and full-load-and-cdc migration types. For a full-load-and-cdc task, AWS DMS migrates table data, and then applies data changes that occur on the source, automatically establishing continuous replication.

When starting a replication task using StartReplicationTask, you can use those possible values for --start-replication-task-type, see also start-replication-task:

start-replication:

The only valid value for the first run of the task when the migration type is full-load or full-load-and-cdc

resume-processing:

Not applicable for any full-load task, because you can’t resume partially loaded tables during the full load phase. Use it to replicate the changes from the last stop position.

reload-target:

For a full-load-and-cdc task, load all the tables again, and start capturing source changes.

Migration by DMS Source

This section enumerates specific information to consider when aiming to use DMS for your database as a source element.