Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: enable skipna on groupby reduction ops #15675

Open
9 tasks
jreback opened this issue Mar 13, 2017 · 12 comments · Fixed by #60741 · May be fixed by #60752
Open
9 tasks

ENH: enable skipna on groupby reduction ops #15675

jreback opened this issue Mar 13, 2017 · 12 comments · Fixed by #60741 · May be fixed by #60752
Assignees
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Groupby Master Tracker High level tracker for similar issues Reduction Operations sum, mean, min, max, etc.

Comments

@jreback
Copy link
Contributor

jreback commented Mar 13, 2017

Edit[rhshadach]: The following methods do not have a skipna argument in groupby, but do have such an argument on the Series/DataFrame variant.

  • max
  • mean
  • median
  • min
  • prod
  • sem
  • std
  • sum
  • var

#15674

In [19]: import pandas as pd
    ...: import numpy as np
    ...: d = {'l':  ['left', 'right', 'left', 'right', 'left', 'right'],
    ...:      'r': ['right', 'left', 'right', 'left', 'right', 'left'],
    ...:      'v': [-1, 1, -1, 1, -1, np.nan]}
    ...: df = pd.DataFrame(d)
    ...: 

In [20]: df.groupby('l').v.sum()
Out[20]: 
l
left    -3.0
right    2.0
Name: v, dtype: float64

In [21]: df.groupby('l').v.apply(lambda x: x.sum(skipna=False))
Out[21]: 
l
left    -3.0
right    NaN
Name: v, dtype: float64

ideally write [21] as
df.groupby('l').v.sum(skipna=False)

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate Groupby labels Mar 13, 2017
@jreback jreback added this to the Next Major Release milestone Mar 13, 2017
@mayukh18
Copy link

Can I take this up?

@jreback
Copy link
Contributor Author

jreback commented Mar 13, 2017

that would be great @mayukh18 !

@jorisvandenbossche jorisvandenbossche removed the Compat pandas objects compatability with Numpy or Python functions label Mar 13, 2017
@mayukh18
Copy link

@jreback I was thinking about putting a passthru check for specifically the how == 'add' case a level above and after the actual cython implementation. Does this seem like a good idea?

@jreback
Copy link
Contributor Author

jreback commented Mar 14, 2017

no, you simply need to add this on to _groupby_function near the top. Needs testing for all ops which support skipna (pretty much all numeric ops)

@mayukh18
Copy link

Adding this in _groupby_function will not effect any change for ops like mean, median etc as they are separately implemented. So, do those need the skipna too? Whereas, first and last ops don't seem like requiring skipna but they will be effected by modifying _groupby_function. I am naive so pardon me if I am asking silly stuff.

@jreback
Copy link
Contributor Author

jreback commented Mar 14, 2017

@mayukh18 yes the implementation is a bit muddled.

methods which are defined with an actual function (e.g. median/mean), this is very easy, just add to the validation function where numeric_only is now; that will pass it thru.

we can still use _groupby_function, what I would do is this.

add the default for skipna (so do this on all but first/last)
sum = _groupby_function('sum', 'add', np.sum, skipna=True)

then add skipna=None in the signature for _groupby_function, which will add the arg if its not None to kwargs (so it gets passed).

@mullenkamp
Copy link

This is a surprisingly old issue, but this functionality would be really nice and consistent with the non-groupby methods.

@vladu
Copy link
Contributor

vladu commented Jun 29, 2022

Indeed, I believe this is not just a "nice-to-have" any more, but a necessary step before this FutureWarning can be resolved by users:

FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().

As long as DataFrame.sum() and DataFrame.groupby().sum() (and other agg functions) have inconsistent APIs, dropping the level kwarg from non-grouped classes isn't really a good step, IMO.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach rhshadrach added Master Tracker High level tracker for similar issues Reduction Operations sum, mean, min, max, etc. labels Jan 26, 2024
@rhshadrach rhshadrach self-assigned this Jan 26, 2024
@rhshadrach rhshadrach removed their assignment Apr 14, 2024
@andremcorreia
Copy link
Contributor

take

@tiago-firmino
Copy link

take

andremcorreia added a commit to andremcorreia/pandas that referenced this issue May 27, 2024
Added a skipna argurment to the groupby reduction ops sum, prod, min, max, mean, median, var, std and sem
Added relevant tests
Updated whatsnew to reflect changes

Co-authored-by: Tiago Firmino <tiago.esteves.firmino@tecnico.ulisboa.pt>
andremcorreia added a commit to andremcorreia/pandas that referenced this issue May 27, 2024
Added a skipna argurment to the groupby reduction ops sum, prod, min, max, mean, median, var, std and sem
Added relevant tests
Updated whatsnew to reflect changes

Co-authored-by: Tiago Firmino <tiago.esteves.firmino@tecnico.ulisboa.pt>
andremcorreia added a commit to andremcorreia/pandas that referenced this issue May 27, 2024
Added a skipna argurment to the groupby reduction ops sum, prod, min, max, mean, median, var, std and sem
Added relevant tests
Updated whatsnew to reflect changes

Co-authored-by: Tiago Firmino <tiago.esteves.firmino@tecnico.ulisboa.pt>
andremcorreia added a commit to andremcorreia/pandas that referenced this issue May 27, 2024
Added a skipna argurment to the groupby reduction ops:
  sum, prod, min, max, mean, median, var, std and sem
Added relevant tests
Updated whatsnew to reflect changes

Co-authored-by: Tiago Firmino <tiago.esteves.firmino@tecnico.ulisboa.pt>
@fboerman
Copy link

it seems this is still not availabe, would be great to have!

@rhshadrach
Copy link
Member

Contributions are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment