Pandas: Can you rewrite this line to use df.loc?

I have been working through Reindert-Jan Ekker excellent Plurasight course Pandas Playbook: Manipulating Data and right at the end of Demo: Detecting and Inspecting Missing Values, part of Module 5 Cleaning data, he asks the following question:

# Can you rewrite this line to use df.loc?
df['MIN_TEMP_GROUND'].drop(every_6th_row).isnull().all() 
True

This code checks that data for MIN_TEMP_GROUND column only appears every 6th row. It drops every 6th row, the remaining rows should all be null which is confirmed when the code executes and returns True.

One problem with the existing solution is the use of chaining to obtain the answer and it might be more efficient to use loc instead. After kicking this around for a couple of hours and beginning to spin my wheels I asked a question on Stackoverflow. The answer used loc in conjunction with the % (modulo) operator to identify the rows of interest and check they were null.

df.loc[(df.index % 6) != 5, 'MIN_TEMP_GROUND'].isnull().all()
True

Acknowledgements

Reindert-Jan Ekker whose Plurasight catalogue of courses I can recommend.

Henry Ecker for generously answering my Stackoverflow question.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.