Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectId Sidebar Filter Fixes #5415

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Conversation

benjaminpkane
Copy link
Contributor

@benjaminpkane benjaminpkane commented Jan 21, 2025

What changes are proposed in this pull request?

Allow invalid ObjectId searches

  • Allow > 24 hexadecimal character searches

  • Allow non-hexadecimal character searches

  • Fixes cursor matching for id fields in embedded lists, e.g. ground_truth.detections.id

How is this patch tested? If it is not, please explain why.

GraphQL tests

Release Notes

  • Fixed hard errors related to invalid ID searches in the sidebar

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • New Features

    • Enhanced search functionality for ObjectId fields
    • Improved query capabilities in lightning server
    • Added more flexible search matching criteria
  • Bug Fixes

    • Improved handling of ObjectId search validation
    • Fixed search term processing for complex queries
  • Tests

    • Added new unit tests for ObjectId lightning queries
    • Expanded test coverage for search functionality

The changes optimize search performance and provide more robust handling of ObjectId fields across the application's query and filtering systems.

@benjaminpkane benjaminpkane added bug Bug fixes app Issues related to App features labels Jan 21, 2025
@benjaminpkane benjaminpkane self-assigned this Jan 21, 2025
Copy link
Contributor

coderabbitai bot commented Jan 21, 2025

Walkthrough

This pull request introduces improvements to handling object ID searches across multiple components. The changes focus on enhancing the search functionality for object ID fields in the FiftyOne platform. The modifications span several files, including utility functions, server-side query logic, and test cases. The primary goal is to provide more robust and flexible handling of object ID searches, with specific attention to validation, matching, and query performance.

Changes

File Change Summary
app/packages/core/src/components/Filters/StringFilter/state.ts Added isObjectIdString import and updated stringSearchResults selector to handle object ID field searches
app/packages/core/src/components/Filters/StringFilter/useOnSelect.ts Updated type imports for RecoilState and Result
app/packages/utilities/src/type-check.ts Modified isObjectIdString function to support strict and non-strict object ID validation
fiftyone/server/lightning.py Enhanced distinct query functionality with new search handling for object ID fields
tests/unittests/lightning_tests.py Added test class for object ID lightning queries and updated query execution methods

Possibly related PRs

Suggested Labels

enhancement

Suggested Reviewers

  • minhtuev
  • CamronStaley

Poem

🐰 Hop, hop, through the code we go,
Object IDs dancing to and fro,
Searching with precision and might,
Making queries clean and bright!
A rabbit's quest for perfect parse, 🔍
Filtering magic without a sparse!

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@@ -423,6 +425,23 @@ async def _do_distinct_pipeline(
return values


def _add_search(query: DistinctQuery):
# strip chars after 24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean sidebar only filters by first 24 chars? So long file path filters could potentially not include the file name?

What's the purpose of limiting to the first 24

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment. I refactored this incorrectly. It should only apply to ObjectId fields

I think I will add frontend validation to omit these queries completely. That is a more holistic approach.

An object id is always a 24 char hexadecimal string. The search results can be omitted when the search does not comply

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will undraft when I'm happy with the solution

@benjaminpkane
Copy link
Contributor Author

Undrafting

@benjaminpkane benjaminpkane marked this pull request as ready for review January 30, 2025 22:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/unittests/lightning_tests.py (1)

1175-1218: Enhance test coverage for ObjectId searches.

The current test only covers basic scenarios. Consider adding tests for:

  1. Invalid hex characters
  2. Empty strings
  3. Null values
  4. Mixed case hex strings
  5. Strings shorter than 24 characters

Here's an example of additional test cases:

@drop_async_dataset
async def test_object_ids_edge_cases(self, dataset: fo.Dataset):
    keys = _add_samples(
        dataset,
        dict(id="0" * 24),
        dict(id="a" * 24),
        dict(id="f" * 24)
    )
    
    # Test invalid hex
    result = await _execute(
        query, dataset, fo.ObjectIdField, keys,
        frames=False, search="0123456789abcdefghijklmn"
    )
    for path in result.data["lightning"]:
        self.assertListEqual(path["values"], [])
    
    # Test empty string
    result = await _execute(
        query, dataset, fo.ObjectIdField, keys,
        frames=False, search=""
    )
    for path in result.data["lightning"]:
        self.assertListEqual(path["values"], [])
    
    # Test mixed case
    result = await _execute(
        query, dataset, fo.ObjectIdField, keys,
        frames=False, search="A" * 24
    )
    for path in result.data["lightning"]:
        self.assertListEqual(path["values"], ["a" * 24])
app/packages/core/src/components/Filters/StringFilter/state.ts (1)

60-63: Improve code organization and documentation.

Consider extracting the ObjectId validation logic to a separate function and adding documentation.

Apply this diff:

+      // Helper function to validate ObjectId search strings
+      const isValidObjectIdSearch = (path: string, search: string): boolean => {
+        return !(get(fos.isObjectIdField(path)) && !isObjectIdString(search, false));
+      };
+
-      // for object id searches, skip request when the string is not <= 24 hex
-      if (get(fos.isObjectIdField(path)) && !isObjectIdString(search, false)) {
+      if (!isValidObjectIdSearch(path, search)) {
         return { values: [] };
       }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd1cfe8 and 8b28480.

📒 Files selected for processing (5)
  • app/packages/core/src/components/Filters/StringFilter/state.ts (2 hunks)
  • app/packages/core/src/components/Filters/StringFilter/useOnSelect.ts (1 hunks)
  • app/packages/utilities/src/type-check.ts (1 hunks)
  • fiftyone/server/lightning.py (4 hunks)
  • tests/unittests/lightning_tests.py (5 hunks)
✅ Files skipped from review due to trivial changes (1)
  • app/packages/core/src/components/Filters/StringFilter/useOnSelect.ts
🧰 Additional context used
📓 Path-based instructions (2)
app/packages/core/src/components/Filters/StringFilter/state.ts (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

app/packages/utilities/src/type-check.ts (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

🪛 Ruff (0.8.2)
fiftyone/server/lightning.py

348-348: Do not assign a lambda expression, use a def

Rewrite matcher as a def

(E731)


351-351: Do not assign a lambda expression, use a def

Rewrite matcher as a def

(E731)


354-354: Do not assign a lambda expression, use a def

Rewrite matcher as a def

(E731)


437-437: Do not use bare except

(E722)

Comment on lines +17 to +18
export function isObjectIdString(value: string, strict = true) {
return isHex(value) && strict ? value.length === 24 : value.length <= 24;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve the ObjectId validation logic.

The current implementation has a few issues that could be improved:

  1. The regex pattern should be anchored and shouldn't use the global flag
  2. The function logic can be simplified

Apply this diff to improve the implementation:

-export function isObjectIdString(value: string, strict = true) {
-  return isHex(value) && strict ? value.length === 24 : value.length <= 24;
+export function isObjectIdString(value: string, strict = true) {
+  if (!value || (strict ? value.length !== 24 : value.length > 24)) {
+    return false;
+  }
+  return /^[0-9a-f]+$/.test(value);
+}

This implementation:

  • Validates the length first (fail fast)
  • Uses an anchored regex without global flag
  • Handles empty/null values
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function isObjectIdString(value: string, strict = true) {
return isHex(value) && strict ? value.length === 24 : value.length <= 24;
export function isObjectIdString(value: string, strict = true) {
if (!value || (strict ? value.length !== 24 : value.length > 24)) {
return false;
}
return /^[0-9a-f]+$/.test(value);
}

Comment on lines +348 to +354
matcher = lambda v: False
if query.search:
match = query.search
matcher = lambda v: match not in v
if query.is_object_id_field:
match = match[:_TWENTY_FOUR]
matcher = lambda v: v < match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace lambda functions with regular functions.

Using lambda functions for complex logic reduces readability and maintainability. Convert these to regular functions.

Apply this diff:

-    matcher = lambda v: False
+    def matcher(v):
+        return False

     if query.search:
         match = query.search
-        matcher = lambda v: match not in v
+        def matcher(v):
+            return match not in v

         if query.is_object_id_field:
             match = match[:_TWENTY_FOUR]
-            matcher = lambda v: v < match
+            def matcher(v):
+                return v < match
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
matcher = lambda v: False
if query.search:
match = query.search
matcher = lambda v: match not in v
if query.is_object_id_field:
match = match[:_TWENTY_FOUR]
matcher = lambda v: v < match
def matcher(v):
return False
if query.search:
match = query.search
def matcher(v):
return match not in v
if query.is_object_id_field:
match = match[:_TWENTY_FOUR]
def matcher(v):
return v < match
🧰 Tools
🪛 Ruff (0.8.2)

348-348: Do not assign a lambda expression, use a def

Rewrite matcher as a def

(E731)


351-351: Do not assign a lambda expression, use a def

Rewrite matcher as a def

(E731)


354-354: Do not assign a lambda expression, use a def

Rewrite matcher as a def

(E731)

Comment on lines +428 to +442
def _add_search(query: DistinctQuery):
# strip chars after 24
if query.is_object_id_field:
search = query.search[:_TWENTY_FOUR]
add = (_TWENTY_FOUR - len(search)) * "0"
if add:
search = f"{search}{add}"
try:
value = {"$gte": ObjectId(search)}
except:
# search is not valid
value = {"$lt": ObjectId("0" * _TWENTY_FOUR)}
else:
value = Regex(f"^{search}")
return {"$match": {query.path: value}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling in ObjectId search logic.

The current implementation has a few issues:

  1. Uses bare except clause which can hide errors
  2. Doesn't validate the hex string before attempting ObjectId conversion

Apply this diff:

 def _add_search(query: DistinctQuery):
-    # strip chars after 24
     if query.is_object_id_field:
         search = query.search[:_TWENTY_FOUR]
         add = (_TWENTY_FOUR - len(search)) * "0"
         if add:
             search = f"{search}{add}"
         try:
+            if not all(c in '0123456789abcdef' for c in search.lower()):
+                raise ValueError("Invalid hex string")
             value = {"$gte": ObjectId(search)}
-        except:
-            # search is not valid
+        except (ValueError, TypeError) as e:
+            # search is not a valid hex string or ObjectId
             value = {"$lt": ObjectId("0" * _TWENTY_FOUR)}
     else:
         value = Regex(f"^{search}")
     return {"$match": {query.path: value}}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _add_search(query: DistinctQuery):
# strip chars after 24
if query.is_object_id_field:
search = query.search[:_TWENTY_FOUR]
add = (_TWENTY_FOUR - len(search)) * "0"
if add:
search = f"{search}{add}"
try:
value = {"$gte": ObjectId(search)}
except:
# search is not valid
value = {"$lt": ObjectId("0" * _TWENTY_FOUR)}
else:
value = Regex(f"^{search}")
return {"$match": {query.path: value}}
def _add_search(query: DistinctQuery):
if query.is_object_id_field:
search = query.search[:_TWENTY_FOUR]
add = (_TWENTY_FOUR - len(search)) * "0"
if add:
search = f"{search}{add}"
try:
if not all(c in '0123456789abcdef' for c in search.lower()):
raise ValueError("Invalid hex string")
value = {"$gte": ObjectId(search)}
except (ValueError, TypeError) as e:
# search is not a valid hex string or ObjectId
value = {"$lt": ObjectId("0" * _TWENTY_FOUR)}
else:
value = Regex(f"^{search}")
return {"$match": {query.path: value}}
🧰 Tools
🪛 Ruff (0.8.2)

437-437: Do not use bare except

(E722)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app Issues related to App features bug Bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants