Schema Evolution Rules
You’ve decided to add a timestamp field to your user events. How do you do this without creating crashes for the existing consumers that were written months ago? This is the core problem of Schema Evolution.
1. Compatibility Modes
The Schema Registry supports several “Compatibility Modes” that dictate how a schema can be changed.
Backward Compatibility (Default)
A new version of a schema can be used to read data written with old schemas.
- Rule: You can delete fields or add optional fields (fields with a default value).
- Upgrade Order: Upgrade Consumers first, then Producers.
Forward Compatibility
Old versions of the schema can be used to read data written with the new schema.
- Rule: You can add fields or delete optional fields.
- Upgrade Order: Upgrade Producers first, then Consumers.
Full Compatibility
Both Backward and Forward compatible.
- Rule: You can only add/delete optional fields.
- Order: Any order.
2. Why “Backward” Is the Default
In most event-driven systems, the consumer’s ability to process existing history is the most important factor. By ensuring backward compatibility, you ensure that even if you start producing a new version of data tomorrow, your current analytics jobs won’t fail when they try to read the topic’s history.
3. Best Practices for Evolution
- Always use Default Values: When adding a new field, always provide a default value. This makes your schema evolution much smoother.
- Never rename fields: Renaming is essentially a “Delete” + “Add,” which often breaks compatibility.
- Keep it simple: The more complex your evolution rules, the harder it is to maintain the system over years of development.
4. Interactive: Compatibility Checker
Pick a change and see if it’s “Backward Compatible.”
5. Summary
Schema Evolution is the “Superpower” of using a formal serialization format like Avro or Protobuf. It allows your data team and your engineering team to move at different speeds, safe in the knowledge that their data contracts are being enforced in real-time.