-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix arrow groupby na #60777
base: main
Are you sure you want to change the base?
Fix arrow groupby na #60777
Conversation
0e61611
to
66330ee
Compare
# dictionary encode does nothing if an already encoded array is given | ||
data = data.cast(data.type.value_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're explaining why you have the cast here, but I'm not quite understanding it. Can you give a bit more details (just as a response here, and we can decide if the comment should be updated later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input array is already a dictionary array then the dictionary encode function does nothing and hence the null_encoding parameter is ignored
https://arrow.apache.org/docs/python/generated/pyarrow.compute.dictionary_encode.html
Casting the array and then re-encoding it, ensures the proper use of null_encoding when NA values are desired in the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in that this only needs to be done for pyarrow<15.0.0
? If that is the case, then can you do this conditionally based on the version of PyArrow available.
from pandas.compat.pyarrow import pa_version_under15p0
doc/source/whatsnew/v3.0.0.rst
Outdated
- Fixed bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to an appropriate section below, I think ExtensionArray
.
66330ee
to
273d85f
Compare
273d85f
to
226ae6c
Compare
doc/source/whatsnew/v3.0.0.rst
file for the bug fix.