Schema Evolution Rules

Imagine a Black Friday sale. You’re processing thousands of checkout events per second when a product manager requests a new promo_code field on the payload. You deploy the schema update, but suddenly, the existing legacy analytics pipeline—which hasn’t been updated in months—starts crashing on the new events. The analytics team is blind, and you’re reverting code in a panic.

This is the core problem of Schema Evolution: how do you alter data structures over time without breaking existing production consumers or missing historical data?

1. Compatibility Modes

The Schema Registry supports several “Compatibility Modes” that dictate how a schema can be changed.

Backward Compatibility (Default)

A new version of a schema can be used to read data written with old schemas.

  • Rule: You can delete fields or add optional fields (fields with a default value).
  • Upgrade Order: Upgrade Consumers first, then Producers.

Forward Compatibility

Old versions of the schema can be used to read data written with the new schema.

  • Rule: You can add fields or delete optional fields.
  • Upgrade Order: Upgrade Producers first, then Consumers.

Full Compatibility

Both Backward and Forward compatible.

  • Rule: You can only add/delete optional fields.
  • Order: Any order.

Transitive Compatibility (The Real World)

By default, the rules above only check compatibility against the immediately previous version (e.g., v3 is checked against v2). However, in production, you might have consumers running code built for v1 reading data produced by v3.

To ensure safety across the entire history of schemas, you must use Transitive modes (BACKWARD_TRANSITIVE, FORWARD_TRANSITIVE, or FULL_TRANSITIVE). This enforces the evolution rules against all previously registered versions, not just the latest one.


2. Why “Backward” Is the Default

In most event-driven systems, the consumer’s ability to process existing history is the most important factor. By ensuring backward compatibility, you ensure that even if you start producing a new version of data tomorrow, your current analytics jobs won’t fail when they try to read the topic’s history.


3. Best Practices for Evolution

  1. Always use Default Values: When adding a new field, always provide a default value. This makes your schema evolution much smoother.
  2. Never rename fields: Renaming is essentially a “Delete” + “Add,” which often breaks compatibility.
  3. Keep it simple: The more complex your evolution rules, the harder it is to maintain the system over years of development.

4. Interactive: Compatibility Checker

Pick a change and see if it’s “Backward Compatible.”

Add Mandatory Field "age": INT
Add Optional Field "age": INT (Default: 0)
Delete Optional Field "bio"
Rename "name" to "fullName"
Select a change...

5. Summary

Schema Evolution is the “Superpower” of using a formal serialization format like Avro or Protobuf. It allows your data team and your engineering team to move at different speeds, safe in the knowledge that their data contracts are being enforced in real-time.