# Groupby given percentiles of the values of the chosen DataFrame column

Imagine that I have a `DataFrame` with columns that contain only real values.

``````>> df
col1   col2      col3
0     0.907609     82  4.207991
1     3.743659   1523  6.488842
2     2.358696    324  5.092592
3     0.006793      0  0.000000
4    19.319746  11969  7.405685
``````

I want to group it by quartiles (or any other percentiles specified by me) of the chosen column (e.g., `col1`), to perform some operations on these groups. Ideally, I would like to do something like:

``````df.groupy( quartiles_of_col1 ).mean()  # not working, how to code quartiles_of_col1?
``````

The output should give the mean of each of the columns for four groups corresponding to the quartiles of `col1`. Is this possible with the `groupby` command? What’s the simplest way of achieving it?

I don’t have a computer to test it right now, but I think you can do it by: `df.groupby(pd.cut(df.col0, np.percentile(df.col0, [0, 25, 75, 90, 100]), include_lowest=True)).mean()`. Will update after 150mins.

Some explanations:

``````In :
#use np.percentile to get the bin edges of any percentile you want
np.percentile(df.col0, [0, 25, 75, 90, 100])
Out:
[0.0067930000000000004,
0.907609,
3.7436589999999996,
13.089311200000001,
19.319745999999999]
In :
#Need to use include_lowest=True
print df.groupby(pd.cut(df.col0, np.percentile(df.col0, [0, 25, 75, 90, 100]), include_lowest=True)).mean()
col0     col1      col2
col0
[0.00679, 0.908]   0.457201     41.0  2.103996
(0.908, 3.744]     3.051177    923.5  5.790717
(3.744, 13.0893]        NaN      NaN       NaN
(13.0893, 19.32]  19.319746  11969.0  7.405685
In :
#Or the smallest values will be skiped
print df.groupby(pd.cut(df.col0, np.percentile(df.col0, [0, 25, 75, 90, 100]))).mean()
col0     col1      col2
col0
(0.00679, 0.908]   0.907609     82.0  4.207991
(0.908, 3.744]     3.051177    923.5  5.790717
(3.744, 13.0893]        NaN      NaN       NaN
(13.0893, 19.32]  19.319746  11969.0  7.405685
``````