We've come full circle - the whole idea of lakes was that you could land data without worrying about the schema, but the move towards more managed, governed lakes using Delta has meant we need to apply a schema again... so how do we balance evolving schemas with the need for managed structures?
The new schema drift features in Databricks Autoloader take a decent stab at this problem - when reading from JSON sources, we can now pull the attributes we want into a known schema, but keep everything else as a json string that we can then extract further details from. In this week's video, Simon takes a look into the new feature, how it works and one or two of the limitations.
As always, don't forget to like & subscribe!
Ещё видео!