DynamoDB CDC to CrateDB using DynamoDB Streams Kinesis Adapter¶
Introduction¶
DynamoDB Streams captures a time-ordered sequence of item-level modification in any DynamoDB table and stores this information in a log for up to 24 hours.
Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.
About¶
A change data capture (CDC) pipeline made of a DynamoDB egress CDC processor, sinking data into the CrateDB OLAP database, using the DynamoDB Streams Kinesis Adapter (GitHub).
Using the Amazon Kinesis Adapter is the recommended way to consume streams from Amazon DynamoDB.
– Using the DynamoDB Streams Kinesis adapter to process stream records
What’s Inside¶
On a compute-environment of your choice, supporting Python, a traditional KCL v2 application using the client-side DynamoDB Streams Kinesis Adapter, subscribes to a DynamoDB Change Stream, which is pretending to be a Kinesis Stream, in order to receive published CDC opslog messages.
On the egress side, the application re-materializes the items of the operations log into any database with SQLAlchemy support.
Holzweg!¶
# HACK
# Kinesis backend.
multi_lang_daemon_class = "software.amazon.kinesis.multilang.MultiLangDaemon"
# DynamoDB backend.
# https://github.com/awslabs/dynamodb-streams-kinesis-adapter/issues/46#issuecomment-1260222792
multi_lang_daemon_class = "com.amazonaws.services.dynamodbv2.streamsadapter.StreamsMultiLangDaemon"
Q: It looks like the “DynamoDB Streams Kinesis Adapter” project is dead?
https://github.com/awslabs/dynamodb-streams-kinesis-adapter/issues/40
https://github.com/awslabs/dynamodb-streams-kinesis-adapter/issues/42
There would be an option to try this by downgrading to KCL v1. We are not sure if it is worth to try it, though.
A: Upgrade to KCLv2 will probably happen at some time in the future.