An introduction and overview about Singer

About

An introduction to the Singer ecosystem of data pipeline components for composable open source ETL, revolving around topics in data integration.

All of Singer, Meltano, PipelineWise, and Airbyte provide components and integration engines adhering to the Singer specification.

On the database integration side, the connectors of Singer and Meltano are based on SQLAlchemy.

Overview

Stitch

Developers shouldn’t have to write ETL scripts.

Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required. Set up in minutes unlimited data volume during trial 5 million rows of data free, forever.

“With Stitch, we were able to get our data streaming to our warehouse in minutes; and it requires zero engineering maintenance.”

Stitch is a cloud-first, developer-focused platform for rapidly moving data. Stitch was acquired by Talend in November 2018 and operates as an independent business unit.

Singer

The open-source standard for writing scripts that move data.

Singer is an open source specification and software framework for ETL/ELT data exchange between a range of different systems. For talking to SQL databases, it employs a metadata subsystem based on SQLAlchemy.

Singer reads and writes Singer-formatted JSONL messages, following the Singer Spec.

The Singer specification was started in 2016 by Stitch Data. It specified a data transfer format that would allow any number of data systems, called taps, to send data to any data destinations, called targets. Airbyte was incorporated in 2020 and created their own specification that was heavily inspired by Singer. There are differences, but the core of each specification is sending new-line delimited JSON data from STDOUT of a tap to STDIN of a target.

PipelineWise

PipelineWise is another Data Pipeline Framework using the Singer.io specification to ingest and replicate data from various sources to various destinations. The list of PipelineWise Taps include another bunch of high-quality data-source and -sink components.

Data Mill

Data Mill helps organizations utilize modern data infrastructure and data science to power analytics, products, and services.

Meltano

Unlock all the data that powers your data platform.

Say goodbye to writing, maintaining, and scaling your own API integrations with Meltano’s declarative code-first data integration engine, bringing a number of APIs and DBs to the table.

Meltano builds upon Singer technologies, uses configuration files in YAML syntax instead of JSON, adds an improved SDK and other components, and runs the central addon registry, meltano | Hub.

dbt

data build tool (dbt) is an open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively.

History

It started at RJMetrics in 2016 as a solution to add basic transformation capabilities to Stitch (acquired by Talend in 2018). The earliest versions of dbt allowed analysts to contribute to the data transformation process following the best practices of software engineering.

Estuary

MEET THE FASTEST, MOST RELIABLE ETL.

The only platform built from the ground up for truly real-time ETL and ELT data integration, set up in minutes.

Captures

  • Captures extract data from an endpoint using a connectors.

  • Estuary builds and maintains many real-time connectors for various technology systems, such as database change data capture (CDC) connectors.

  • Captures run continuously: As soon as new documents are made available at the endpoint resources, Flow validates their schema and adds them to the appropriate collection. Captures can process documents up to 16 MB in size.

Batch sources

Flow supports running both first and third party connectors to batch sources including Singer, Meltano and Airbyte as well as natively-written Estuary connectors. These connectors tend to focus on SaaS APIs, and do not offer real-time streaming integrations. Flow runs the connector at regular intervals to capture updated documents.

Qlik

Make better use of enterprise assets. With Qlik Talend® Data Integration and Quality you can drive AI innovation, intelligent decisions, and business modernization.

Stitch can replicate data from all your sources to a central warehouse. From there, it’s easy to use Qlik to perform the in-depth analysis you need.

Accelerate data replication, ingestion and streaming across a wide variety of heterogeneous databases, data warehouses, and big data platforms. You’ll be able to move your data easily, securely and efficiently with minimal operational impact.

In 2023, Qlik acquired Talend.

Omni Analytics

All jobs are data jobs.

Business intelligence that speaks your language Explore, visualize, and model data your way with Omni. From spreadsheets to SQL—in a single platform.

Data tool proliferation has gone off-the-rails. There’s now a tool for every data need, but they all require compromises.

Founded in 2022. RJMetrics alum Chris Merrick is behind Omni.

https://omni.co/

Evaluations

Singer vs. Meltano

Meltano as a framework fills many gaps and makes Singer convenient to actually use. It is impossible to outline all details and every difference, so we will focus on the “naming things” aspects for now.

Both ecosystems use different names for the same elements. That may be confusing at first, but it is easy to learn: For the notion of data source vs. data sink, common to all pipeline systems in one way or another, Singer uses the terms tap vs. target, while Meltano uses extractor vs. loader. Essentially, they are the same things under different names.

Ecosystem

Data source

Data sink

Singer

Tap

Target

Meltano

Extractor

Loader

In Singer jargon, you tap data from a source, and send it to a target. In Meltano jargon, you extract data from a source, and then load it into the target system.

Singer and Airbyte criticism

Timeline

June 2, 2016

Oct 26, 2016

Christopher Merrick submits first commit on Stitch. https://github.com/stitchdata/python-stitch-client/commit/bcc84f232

Oct 28, 2016

Christopher Merrick submits first commit of the Python Singer implementation. https://github.com/singer-io/singer-python/commit/64990dd0ae

Nov 17, 2016

Christopher Merrick submits first commit of the Singer specification. https://github.com/singer-io/getting-started/commit/f780adab0

November 7, 2018

Talend to Acquire Stitch, a leader in self-service Cloud data integration.

June 30, 2021

GitLab spins out open source data integration platform Meltano

October 6, 2021

Estuary helps enterprises harness historical and real-time data pipelines

August 16, 2022

Introducing Omni, the new generation of business intelligence, founded by Chris Merrick, Colin Zima, and Jamie Davidson.

May 16, 2023

Qlik Acquires Talend, combining its Best-in-Class Data Integration with Talend’s Leading Data Transformation, Quality, and Governance Capabilities. Talend and Qlik’s Data Integration and Quality solutions automate the delivery of trusted, business-ready data, enabling smarter decisions, operational efficiency, and innovation.

May 24, 2024

The co-founders of Omni Analytics, CEO Colin Zima, President Jamie Davidson, and CTO Christopher Merrick have spent a decade building data products, such as Looker and Stitch. They bring a wealth of experience in business intelligence, semantic layers, cloud data management, and customer-first support.

Comparisons