Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add anti-joins to pandas.merge #42916

Open
rj-global opened this issue Aug 6, 2021 · 7 comments · May be fixed by #60732
Open

ENH: Add anti-joins to pandas.merge #42916

rj-global opened this issue Aug 6, 2021 · 7 comments · May be fixed by #60732
Assignees
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@rj-global
Copy link

Is your feature request related to a problem?

Pandas does not allow anti-joins. It would be helpful to have these added
Pyspark has implemented this on its 'join' command.
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html

Describe the solution you'd like

Add anti-left and anti-right joins to 'how'
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

API breaking implications

None

Describe alternatives you've considered

You can replicate an anti-join by doing an outer join, and filtering.

My code does work but sometimes gives too many columns - not sure why. Also my code relies on the column name not already containing '_drop' - which obviously could be an issue.

Additional context

[add any other context, code examples, or references to existing implementations about the feature request here]

import pandas as pd
def left_anti_join(df1,df2,left_on,right_on):
   """ Perfoms left anti join of two data frames"""
   df=pd.merge(df1,df2,left_on=left_on,right_on=right_on,suffixes=('','_drop'), how='outer', indicator=True)
   df=df[df['_merge']=='left_only']
   df.drop([col for col in df.columns if '_drop' in col],axis=1, inplace=True)
   df.drop(columns="_merge", inplace=True)
   return df
@rj-global rj-global added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 6, 2021
@attack68
Copy link
Contributor

attack68 commented Aug 6, 2021

You (or an implementer) might be interested in this pattern also:

df_l = pd.DataFrame({"A": [1,2,3]}, index=["a", "b", "c"])
df_r = pd.DataFrame({"B": [1,2,4]}, index=["a", "b", "d"])

m = pd.merge(df_l, df_r, left_index=True, right_index=True, how="outer")

# left anti join
m.loc[(m.index.isin(df_l.index)&~m.index.isin(df_r.index))]

# right anti join
m.loc[(m.index.isin(df_r.index)&~m.index.isin(df_l.index))]

# full anti join
m.loc[~(m.index.isin(df_r.index)&m.index.isin(df_l.index))]

@rj-global
Copy link
Author

Very elegant. 👍

@rhshadrach rhshadrach added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Aug 7, 2021
@ehsantn
Copy link

ehsantn commented Aug 11, 2021

Similarly, Pandas should ideally provide semi join as well.

@jreback
Copy link
Contributor

jreback commented Aug 11, 2021

anyone in the community can contribute here, this is howbdeatures are added

  • core can provide review

@lithomas1 lithomas1 added this to the Contributions Welcome milestone Aug 11, 2021
@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 11, 2021
@debnathshoham
Copy link
Member

take

@samukweku
Copy link
Contributor

take

@samukweku samukweku removed their assignment Dec 18, 2023
@snitish snitish linked a pull request Jan 19, 2025 that will close this issue
5 tasks
@snitish
Copy link
Contributor

snitish commented Jan 19, 2025

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
10 participants