-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add anti-joins to pandas.merge #42916
Labels
Comments
rj-global
added
Enhancement
Needs Triage
Issue that has not been reviewed by a pandas team member
labels
Aug 6, 2021
You (or an implementer) might be interested in this pattern also: df_l = pd.DataFrame({"A": [1,2,3]}, index=["a", "b", "c"])
df_r = pd.DataFrame({"B": [1,2,4]}, index=["a", "b", "d"])
m = pd.merge(df_l, df_r, left_index=True, right_index=True, how="outer")
# left anti join
m.loc[(m.index.isin(df_l.index)&~m.index.isin(df_r.index))]
# right anti join
m.loc[(m.index.isin(df_r.index)&~m.index.isin(df_l.index))]
# full anti join
m.loc[~(m.index.isin(df_r.index)&m.index.isin(df_l.index))] |
Very elegant. 👍 |
Similarly, Pandas should ideally provide semi join as well. |
anyone in the community can contribute here, this is howbdeatures are added
|
lithomas1
removed
the
Needs Triage
Issue that has not been reviewed by a pandas team member
label
Aug 11, 2021
take |
4 tasks
take |
5 tasks
take |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem?
Pandas does not allow anti-joins. It would be helpful to have these added
Pyspark has implemented this on its 'join' command.
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html
Describe the solution you'd like
Add anti-left and anti-right joins to 'how'
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html
API breaking implications
None
Describe alternatives you've considered
You can replicate an anti-join by doing an outer join, and filtering.
My code does work but sometimes gives too many columns - not sure why. Also my code relies on the column name not already containing '_drop' - which obviously could be an issue.
Additional context
[add any other context, code examples, or references to existing implementations about the feature request here]
The text was updated successfully, but these errors were encountered: