Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken paging in management api searcher endpoint #17920

Closed
nikolaj-kaplan opened this issue Jan 9, 2025 · 6 comments
Closed

Broken paging in management api searcher endpoint #17920

nikolaj-kaplan opened this issue Jan 9, 2025 · 6 comments

Comments

@nikolaj-kaplan
Copy link

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

15.0.0

Bug summary

The service /umbraco/management/api/v1/searcher/{searcher}/query returns total: 0 and an empty array when I set skip > 9000 even though I have more than 60.000 documents in my index.

Specifics

I have created a bit more than 60.000 articles in my Umbraco instance.
They seem to be indexed correctly. At least the InternalIndex shows a document count of 65.221, but I can only page through the first 9.000.

Steps to reproduce

A search for *:* with skip=0&take=1

> curl "https://<my-domain>/umbraco/management/api/v1/searcher/InternalIndex/query?term=*%3A*&skip=0&take=1" -H "authorization: Bearer Oa-KsBIAwBEaOauvcX4W7qXBkYhMIH-gEAk05J-pr3w"

returns

{
  "total":65221,
  "items":[{"id":"35455","score":1,"fieldCount":32, ...}]
}

When I adjust the request to skip=9000&take=1 thinks still look correct. But setting skip=10000 returns

{
  "total":0,
  "items":[]
}

Sidenote: When setting skip=0&take=0 I get

{
  "total":0,
  "items":[]
}

But would expect to get

{
  "total":65221,
  "items":[]
}

But that is probably a different bug.

Expected result / actual result

As long as I have more than I have more than 10.000 documents in the index, expect to get an array with a single item in both cases:

  • skip=0&take=1
  • skip=10000&take=1

I expect the total to be the same no matter what my skip and take are set to. Even take=0 should return the correct total.

Copy link

github-actions bot commented Jan 9, 2025

Hi there @nikolaj-kaplan!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

  • We'll assess whether this issue relates to something that has already been fixed in a later version of the release that it has been raised for.
  • If it's a bug, is it related to a release that we are actively supporting or is it related to a release that's in the end-of-life or security-only phase?
  • We'll replicate the issue to ensure that the problem is as described.
  • We'll decide whether the behavior is an issue or if the behavior is intended.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@Migaroez Migaroez self-assigned this Jan 9, 2025
@Migaroez
Copy link
Contributor

Reproduced in 15.1.1
Investigating

@Migaroez
Copy link
Contributor

@Shazwazza I found public const int AbsoluteMaxResults = 10000; in Examine.Search.QueryOptions which we are using in the searcher endpoints. Might this be the reason why the examine searcher behaves as described above?

@Migaroez Migaroez added the state/needs-more-info We don't have enough information to give a good reply label Jan 10, 2025
@Shazwazza
Copy link
Contributor

@Migaroez Yes sorry I should have added this to the breaking changes listed in the 3.4 release https://github.com/Shazwazza/Examine/releases/tag/v3.4.0

The change is here:

Shazwazza/Examine@v3.3.0...v3.4.0#diff-6163da3b70013c885f9d65cf187942e1f3915a0c342e6ddaec0abcbe6548ea0d

Elastic search also has this restriction because paging over more than this many hits can cause performance penalties. Part of the 3.4 changes were optimizations to prevent performance issues (also based on benchmarks). Prior to this change, a search would be executed to find the total document count which would be used but this meant that for every search, there was actually 3x searches taking place which causes a lot of unwanted overhead.

In Examine, you can use the SearchAfter feature: https://shazwazza.github.io/Examine/articles/paging.html#deep-paging

I've updated the Skip_Take test in Examine to highlight the differences of what you are seeing: https://github.com/Shazwazza/Examine/blob/8275c802883de8decc8bb09543e95c508b89b251/src/Examine.Test/Examine.Lucene/Search/FluentApiTests.cs#L2568. In this test it uses the normal skip/take approach but it bottoms out when paging past the 10k data set limit whereas using SearchAfter does work.

But ... this will be problematic for the management/search api because that would mean you'd have to pass the resulting search after values back to the customer to pass back in which is implementation specific.

We can fix this by either:

  • QueryOptions.AbsoluteMaxResults configurable.
  • Add another option to force pre-calculating max results - which would go back to the behavior < 3.4 but will incur additional overhead for an additional query.

But ... doing this will be Lucene implementation specific (i.e. using LuceneQueryOptions and not just QueryOptions) but that should still be fine and compatible with other Examine implementations since LuceneQueryOptions is a sub class of QueryOptions anyways.

I'm assuming there is some use case for paging over this many search results but I suspect it would be fairly rare but happy to make the changes above.

@Shazwazza
Copy link
Contributor

@Migaroez I've published Examine 3.6.0 with info on how you can now make this work https://github.com/Shazwazza/Examine/releases/tag/v3.6.0

@Migaroez
Copy link
Contributor

Fixed in #17977

@Migaroez Migaroez added release/15.2.0 and removed state/needs-more-info We don't have enough information to give a good reply labels Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants