Pipelines with AWS DMS¶
AWS DMS to Kinesis to CrateDB.
What’s Inside¶
Using Amazon Kinesis Data Streams as a target for AWS Database Migration Service
An IaC driver program based on AWS CloudFormation technologies using the cottonformation Python API. It can be used to set up infrastructure on AWS without much ado.
DMS: Full load and CDC
DMS Source: RDS PostgreSQL
DMS Target: Amazon Kinesis
CDC Target: CrateDB Cloud
AWS Infrastructure Setup¶
The following walkthrough describes a full deployment of AWS DMS including relevant outbound data processors for demonstration purposes.
In order to run it in production, you are welcome to derive from it and tweak it for your own purposes. YMMV. If you need support, don’t hesitate to ask for help.
Install¶
Install LorryStream.
pip install --upgrade 'lorrystream[carabas]'
Acquire IaC driver program.
wget https://github.com/daq-tools/lorrystream/raw/main/examples/aws/rds_postgresql_kinesis_lambda_oci_cratedb.py
Configure¶
Please configure endpoint and replication settings within the source code of the IaC program you just acquired, and presented next.
Deploy¶
First, prepare an AWS ECR repository for publishing the OCI image including your downstream processor element that is consuming the replication data stream from Amazon Kinesis, and runs it into CrateDB. To learn about how this works, please visit the documentation section about the ECR Repository.
Configure CrateDB database sink address.
export SINK_SQLALCHEMY_URL='crate://admin:dZ..qB@example.eks1.eu-west-1.aws.cratedb.net:4200/?ssl=true'
Invoke the IaC driver program in order to deploy relevant resources on AWS.
python examples/aws/rds_postgresql_kinesis_lambda_oci_cratedb.py
After deployment succeeded, you will be presented a corresponding response including relevant information about entrypoints to the software stack you’ve just created.
Result of CloudFormation deployment:
psql command: psql "postgresql://dynapipe:secret11@testdrive-dms-postgresql-dev-db.czylftvqn1ed.eu-central-1.rds.amazonaws.com:5432/postgres"
RDS Instance ARN: arn:aws:rds:eu-central-1:831394476016:db:testdrive-dms-postgresql-dev-db
Stream ARN: arn:aws:kinesis:eu-central-1:831394476016:stream/testdrive-dms-postgresql-dev-stream
Replication ARN: arn:aws:dms:eu-central-1:831394476016:replication-config:EAM3JEHXGBGZBPN5PLON7NPDEE
Note
Please note this is a demonstration stack, deviating from typical real-world situations.
Contrary to this stack, which includes an RDS PostgreSQL instance, a database instance will already be up and running, so the remaining task is to just configure the Kinesis Data Stream and consume it.
Contrary to this stack, which uses AWS Lambda to host the downstream processor element, when aiming for better cost-effectiveness, you will run corresponding code on a dedicated computing environment.
Operations¶
Please consult the AWS DMS Handbook to learn about commands suitable for operating the AWS DMS engine.
Usage¶
DMS¶
AWS DMS provides full-load and full-load-and-cdc migration types.
For a full-load-and-cdc task, AWS DMS migrates table data, and then applies
data changes that occur on the source, automatically establishing continuous
replication.
When starting a replication task using StartReplicationTask, you can use those
possible values for --start-replication-task-type, see also start-replication-task:
- start-replication:
The only valid value for the first run of the task when the migration type is
full-loadorfull-load-and-cdc- resume-processing:
Not applicable for any full-load task, because you can’t resume partially loaded tables during the full load phase. Use it to replicate the changes from the last stop position.
- reload-target:
For a
full-load-and-cdctask, load all the tables again, and start capturing source changes.
Migration by DMS Source¶
This section enumerates specific information to consider when aiming to use DMS for your database as a source element.