Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gooohgb
Copy link
Contributor

@gooohgb gooohgb commented Jan 9, 2025

Description

In our index data, some hot keys are associated with a large number of UIDs, but the set filtered by our function is relatively small. During performance testing on this dataset, I noticed that pl.Uids consumes a significant amount of CPU time for slice memory allocation. Therefore, I propose an optimization in the Uids function to leverage the range provided by Intersect to reduce the scope of temporary result sets, thereby minimizing memory allocation.

@gooohgb gooohgb requested a review from a team as a code owner January 9, 2025 09:22
@harshil-goel
Copy link
Contributor

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

@gooohgb
Copy link
Contributor Author

gooohgb commented Jan 9, 2025

Thanks for your suggestion! I'm glad to contribute to this project. I’d like to make this change myself.

@gooohgb
Copy link
Contributor Author

gooohgb commented Jan 9, 2025

Thanks a lot for the input. We have recently upgraded how the list works, and now we have an even better function that you could use instead. Right now there would be an issue if your uid range is too high, but still the numbers are too low. Now you can basically just call list.FindPosting(uid) to see if uid is present or not. But we shouldn't use it if uid intersection range is bigger than uids actually present. So we could even add a check using ApproxLen(). Let me know if you are willing to make the change, or should we?

By the way, I’d like to ask if this PR (#9218) is likely to be merged. The optimization of the eq execution plan seems to significantly improve performance.

@harshil-goel
Copy link
Contributor

Yeah that PR is scheduled to be merged. We are still evaluating and reviewing that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants