Ordering rows in Pandas Data Frame and Bars in Plotly Bar Chart (by day of the week or any other user defined order)

0

I have time series data in my Pandas Data Frame. And I want to present an aggregation of the data by day of the way in an orderly fashion – sorted by day of the week. Not alphabetically, but sorted the way humans would order the days – starting from Monday and walking our way to Saturday and Sunday.

After a little searching, I learned how to order data in a Data Frame based on even a random, user defined ordering.The trick is ‘categorical data’ – a limited, and usually fixed, number of possible values that may have a strong (meaningful) order. The lexical order of a categorical variable may not be the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.

This StackOverflow thread showed me the way.

The starting point is a data frame with time series data – data stamped by date:

image

The data is not sorted in any way.

I want to aggregate: grouping by day of the week, I want to calculate the mean value for deathCount, and I want to present the results order by day of the week – the categorical ordering, not the lexical ordering.

Using the formal categorical type route, I get the result I desire:

from pandas.api.types import CategoricalDtype
cats = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
cat_type = CategoricalDtype(categories=cats, ordered=True)
data['Day of Week'] = data['date'].dt.weekday_name
data['Day of Week']=data['Day of Week'].astype(cat_type)
df_weekday = data.groupby(d['Day of Week']).mean()
df_weekday

image

I define the CategoricalDtype called cat_type and explicitly set the type of the Day of Week column to this categorical type. This defines the ordering of this column.

By changing the order of the weekday names in the cats list, I can define different ordering. It is mine to govern!

In this case, a simpler – less formal, less clear perhaps – option is available through the reindex operation that I can perform on a Data Frame:

cats = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
# create a new data frame with the death counts grouped by day of the week 
# reindex is used to order the week days in a logical order (learned from https://stackoverflow.com/questions/47741400/pandas-dataframe-group-and-sort-by-weekday)
df_weekday = data.copy().groupby(data['date'].dt.weekday_name).mean().reindex(cats)
df_weekday

image

The effect is the same – by reindexing the data frame using the cats list, I order the data frame’s rows in the order prescribed by the list.

After ensuring the rows in data frame df_weekday are in a meaningful order, I can plot the bar chart with the bars in a meaningful order:

image

 

Resources

Pandas Documentation on Categorical Data: https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html

Pandas Doc on reindex: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.