Here's an example of a not-tidy dataset. The columns are storing values of an
implicit variable, income. This violates the "variables are in columns"
principle of tidy data.
import pandas as pd
pew = pd.read_csv("https://raw.githubusercontent.com/nickhould/tidy-data-python/master/data/pew-raw.csv")
pew
We can fix this using the melt function in pandas. This function is
important. You will use it over and over for tidying.
tidy_pew = pd.melt(pew, id_vars=["religion"], var_name="income")
tidy_pew
If ever wanted to go back to the earlier format, you can use pivot. This will
only rarely be the case though (e.g., you decide to run some specialized
algorithm that expects different income levels in columns).
pivot_pew = (pd.pivot(tidy_pew, index="religion", columns="income")
.reset_index())
pivot_pew