Coming from a data warehousing and BI background, Franco Patano wanted to have a catalogue of the Lakehouse, including schema and profiling statistics. He created the Lakehouse Data Profiler notebook using Python and SQL to analyze the data and generate schema and statistics tables. He then uses the new SQL Analytics product from Databricks to dashboard and visualize the data profiling statistics. He discusses how to use these dashboards to optimize JOINs and other operations.
[ Lightning talk from Data + AI Summit 2020. Speaker: Franco Patano] Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. [ Ссылка ]
![](https://i.ytimg.com/vi/58nT52VTzsQ/maxresdefault.jpg)