-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract true gpu object from cudf.pandas
proxy object
#6273
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/okay to test |
@@ -231,7 +233,9 @@ def fit_transform(self, y, z=None) -> cudf.Series: | |||
This is functionally equivalent to (but faster than) | |||
`LabelEncoder().fit(y).transform(y)` | |||
""" | |||
|
|||
if cudf.get_option("mode.pandas_compatible"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @vyasr @mroeschke @bdice @dantegd I'm debating whether this check should live inside cudf.Series
(also DataFrame
, Index
, etc.) constructor itself. I know I reverted the change in rapidsai/cudf#17629 but after looking at cuml's frequency of usages of cudf.Series/DataFrame
constructors, I'm having second thoughts about a special utility(to check and extract true GPU object) in cuml vs baking this utility into cudf classics constructors. I'm inclining towards the later seeing cuml specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a function in cuML might not be a bad idea, was thinking we could encapsulate the functionality into a function like
def create_cudf_series(y)
if cudf.get_option("mode.pandas_compatible"):
if is_proxy_object(y):
y = y.as_gpu_object()
y = cudf.Series(y)
return y
for cuDF objects that we could use as a one liner around, though I'm not sure if we are targetting other codebases with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cuDF objects that we could use as a one liner around, though I'm not sure if we are targetting other codebases with this?
Yes, this is a utility I was planning to add to cuml
, and other libraries.. but we will have to add create_cudf_DataFrame
, create_cudf_Index
, etc.. too. Plus we need to keep duplicating and constantly maintaining all the parameters to Series
& DataFrame
in these utilities. I'm thinking this might end up being a head-ache to consumers of cudf
and libraries might push back the cudf
<->cudf.pandas
interop as a technical detail to cudf
classic.
I know keeping the cudf.pandas
<->cudf
interop inside cudf
classic might look complex but it feels simpler than having to change many libraries and maintaining those utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't really see why separate create_cudf_*
methods would needed?
At least for cuml IIRC, a lot of the API accepts a generic argument and calls a cudf constructor on it, and therefore we just need a utility to ensure a cudf.pandas argument is turned into a GPU argument before calling the cudf constructor (equivalent to doing e.g. cudf.Series(cudf.Series(...))
So I think a function like
def maybe_extract_cudf_pandas(arg):
if isinstance_cudf_pandas(arg, (pd.Series, pd.DataFrame, pd.Index, np.ndarray)):
return arg.as_gpu_object()
return arg
Would need to be defined in cuml (and possibly any RAPIDS library that follows the cuml approach to using cudf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline discussion we're going to try and monkey-patch cudf inside cudf.pandas to support this use case.
closing this PR in favor of: rapidsai/cudf#17878 |
Fixes: #6232
This PR is WIP