In this tutorial we are going to continue with Pytest. We will integrate Data Quality tests into our Python ETL pipeline. In the previous session, we covered the Pytest basics and developed data quality tests. We ran these tests against a dataset to check our assumptions. In this session we will integrate these tests in our Python ETL pipeline we developed earlier.
ETL Pipeline video: [ Ссылка ]
Previous Pytest video: [ Ссылка ]
Medium article on the subject: [ Ссылка ]
Errata in the tests: One of the viewers pointed that the null check was always returning true. It has been revised to to return false when nulls are present. test_null_check function is updated as follow:
def test_null_check(df):
assert df['ProductKey'].notnull().all()
Link to GitHub repo (code): [ Ссылка ]
#etltesting #dataquality #python
Pytest Docs: [ Ссылка ]
Subscribe to our channel:
[ Ссылка ]
---------------------------------------------
Follow me on social media!
Github: [ Ссылка ]
Instagram: [ Ссылка ]
LinkedIn: [ Ссылка ]
---------------------------------------------
Topics covered in this video:
0:00 - Introduction to ETL testing
0:33 - Update Base Script
2:06 - Pipeline testing Directory structure
2:31 - ETL Pipeline Test Script
2:59 - Pytest Fixture
4:02 - Data Quality Tests
5:30 - Data check
5:46 - Up Next
![](https://i.ytimg.com/vi/7FPksG-LYOA/maxresdefault.jpg)