Source Connectors
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. Instead of writing custom producer code to pull data from a database, you use a Source Connector.
1. What is a Source Connector?
A Source Connector is a pre-built plugin that knows how to talk to a specific external system.
- No Code: You configure the connector using a simple JSON file.
- Scalable: Connect runs as a cluster, and tasks are distributed across nodes.
- Fault Tolerant: Connect tracks its position in the external system. If a node fails, it resumes exactly where it left off.
2. Low-Latency Ingestion (CDC)
The most powerful type of source connector is a Change Data Capture (CDC) connector, like Debezium.
- The Problem: Polling a database table for new rows is slow and misses “Delete” operations.
- The CDC Solution: Debezium reads the database’s Transaction Log (e.g., MySQL Binlog or Postgres WAL). It streams every Insert, Update, and Delete in real-time as a Kafka event.
3. Configuration Example
To stream data from a Postgres database into Kafka:
{
"name": "postgres-source",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "localhost",
"database.dbname": "inventory",
"table.include.list": "public.orders",
"topic.prefix": "dbserver1"
}
}
4. Interactive: Connect Pipeline
Watch as a row in the database automatically becomes a message in Kafka.
5. Key Benefit
Source connectors decouple your core application logic from the “grunt work” of data movement. Your engineers can focus on building features, while Kafka Connect ensures that data from all your legacy systems is available in the event stream for real-time processing.