Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit frames lookup pipeline for grid #5450

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open

Conversation

benjaminpkane
Copy link
Contributor

@benjaminpkane benjaminpkane commented Jan 30, 2025

What changes are proposed in this pull request?

Limit the lookup pipeline for frames in grid samples request to one, as only the first frame is needed

How is this patch tested? If it is not, please explain why.

Server tests

Release Notes

  • Fixed sample field filtering on large video samples in the grid

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • New Features

    • Introduced an option to limit the number of frames processed during data operations, giving users improved control over data handling.
  • Refactor

    • Streamlined sample retrieval by modularizing frame and group filtering logic for more consistent and efficient processing.
  • Tests

    • Expanded test coverage to ensure the enhanced frame and sample processing features work reliably.

@benjaminpkane benjaminpkane added the enhancement Code enhancement label Jan 30, 2025
@benjaminpkane benjaminpkane self-assigned this Jan 30, 2025
Copy link
Contributor

coderabbitai bot commented Jan 30, 2025

Walkthrough

This update introduces a new parameter to control the number of frames processed in dataset operations. The limit_frames parameter is added to the _pipeline and _attach_frames_pipeline methods within the Dataset and DatasetView classes. Significant refactoring in the server’s sample pagination code modularizes frame and group handling into helper functions. New unit tests ensure that these changes in frame limiting and pipeline construction are correctly validated.

Changes

File(s) Change Summary
fiftyone/core/dataset.py
fiftyone/core/view.py
Added a new parameter (limit_frames/limit) to the _pipeline and _attach_frames_pipeline methods to control frame attachment in data processing.
fiftyone/server/samples.py Refactored paginate_samples by extracting frame and group logic into new helper functions: get_samples_pipeline, _handle_frames, _has_frame_filtering, _handle_groups.
tests/unittests/server_samples_tests.py Introduced unit tests with a new test class and helper functions (_get_lookup_pipeline, _get_slice_frames_pipeline) to validate sample pipeline and frame filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant DS as Dataset/_pipeline
    participant Attach as _attach_frames_pipeline
    participant View as DatasetView/_pipeline

    Client->>DS: Call _pipeline(limit_frames=value)
    DS->>Attach: Call _attach_frames_pipeline(limit=value)
    Attach-->>DS: Return modified pipeline
    DS-->>Client: Return aggregated pipeline result
Loading
sequenceDiagram
    participant API as API Endpoint
    participant Server as Server Samples
    participant Helpers as Pipeline Helper Functions
    participant Tests as Unit Tests

    API->>Server: Request paginated samples
    Server->>Helpers: Call get_samples_pipeline (with filters)
    Helpers-->>Server: Return constructed lookup & frame pipelines
    Server-->>API: Return results
    Tests->>Server: Execute various filtering scenarios
    Server-->>Tests: Validate output
Loading

Possibly related PRs

  • Optimize dataset first/head and last/tail  #5407: The changes in the main PR are related to the modifications in the _attach_frames_pipeline method, which is also updated in the retrieved PR, indicating a direct connection in their functionality.
  • Omit slice filter for frame collection QP #5207: The changes in the main PR are related to the modifications of the _attach_frames_pipeline method, which is also referenced in the retrieved PR's updates to the lightning_resolver function, indicating a shared focus on frame handling within the dataset processing logic.
  • Remove potential for misspelling expected Dataset fields #4885: The changes in the main PR are related to the modifications in the _attach_frames_pipeline method within the Dataset class, which are also reflected in the updates to the _attach_frames_pipeline method calls in the DatasetView class in the retrieved PR.

Suggested labels

bug, app

Suggested reviewers

  • brimoor
  • sashankaryal

Poem

Hey there, I'm a coding bunny, hopping with delight,
New limits on frames keep our data light.
Pipelines refined, helpers in play,
Tests all pass in a brilliant display.
With carrots and code, we hop into the day!
🥕🐰 Happy coding!

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@swheaton
Copy link
Contributor

It does get the frames if you hover over the sample to play it. Is that a separate query I'm guessing?

@benjaminpkane
Copy link
Contributor Author

benjaminpkane commented Jan 30, 2025

It does get the frames if you hover over the sample to play it. Is that a separate query I'm guessing?

Yeah, playback frames are streamed from /frames

@benjaminpkane benjaminpkane marked this pull request as ready for review February 3, 2025 15:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/unittests/server_samples_tests.py (2)

17-54: Test coverage looks good but could be enhanced.

The test cases cover three important scenarios:

  1. No filters
  2. Sample-level filters
  3. Frame-level filters

However, consider adding edge cases:

  • Empty video samples
  • Videos with single frame
  • Videos with multiple frames but no labels

56-80: Helper functions are well-structured but could be more robust.

The pipeline construction helpers are clear and focused. However, consider adding validation for edge cases:

  • Negative limit values
  • Empty input arrays
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffd7f97 and 1506726.

📒 Files selected for processing (4)
  • fiftyone/core/dataset.py (4 hunks)
  • fiftyone/core/view.py (4 hunks)
  • fiftyone/server/samples.py (2 hunks)
  • tests/unittests/server_samples_tests.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/server/samples.py

220-224: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (7)
tests/unittests/server_samples_tests.py (1)

1-16: LGTM!

The imports and copyright header are properly structured.

fiftyone/server/samples.py (1)

105-105: LGTM!

The pipeline construction with frame limiting is correctly implemented.

fiftyone/core/view.py (2)

1548-1548: LGTM!

The limit_frames parameter is properly added to the method signature and correctly passed through the pipeline.

Also applies to: 1715-1715


1649-1651: LGTM!

The frame limiting is correctly implemented in the frame attachment pipeline calls.

Also applies to: 1656-1658

fiftyone/core/dataset.py (3)

7489-7489: LGTM! Parameter addition aligns with PR objectives.

The addition of the limit_frames parameter to the _pipeline method signature is well-placed and will enable controlling the number of frames processed.


7568-7568: LGTM! Parameter propagation is correct.

The addition of the limit parameter to _attach_frames_pipeline properly enables the frame limiting functionality at the pipeline level.


7599-7607: LGTM! Frame limiting implementation is clean and efficient.

The implementation correctly adds a $limit stage to the MongoDB pipeline when a limit is provided. This efficiently limits frame processing at the database level rather than in application code.

@benjaminpkane
Copy link
Contributor Author

Undrafting

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
fiftyone/server/samples.py (2)

221-230: Optimize frame filtering check using any().

The function can be made more concise and efficient.

 def _has_frame_filtering(filters: t.Optional[JSON]):
     if not filters:
         return False
-
-    for path in filters:
-        if path.startswith(SampleCollection._FRAMES_PREFIX):
-            return True
-
-    return False
+    return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
🧰 Tools
🪛 Ruff (0.8.2)

225-229: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)


232-237: Improve readability of boolean expression.

The group selection logic would be clearer with parentheses to explicitly show operator precedence.

 def _handle_groups(sample_filter: t.Optional[SampleFilter]):
     return dict(
-        manual_group_select=sample_filter
-        and sample_filter.group
-        and (sample_filter.group.id and not sample_filter.group.slices)
+        manual_group_select=(
+            sample_filter
+            and sample_filter.group
+            and (sample_filter.group.id and not sample_filter.group.slices)
+        )
     )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1506726 and 01fbf7f.

📒 Files selected for processing (1)
  • fiftyone/server/samples.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/server/samples.py

225-229: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-app
  • GitHub Check: build / build
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (3)
fiftyone/server/samples.py (3)

105-105: LGTM! Good refactoring.

The extraction of pipeline construction logic into a separate function improves code organization and readability.


183-200: Well-structured pipeline orchestration!

The function effectively coordinates frame and group handling with clear separation of concerns and type hints.


202-219: Excellent implementation of frame limiting!

The function intelligently limits frame lookup to just one frame for grid thumbnails while preserving full lookup capability when needed (for filtering or custom stages). This aligns perfectly with the PR's optimization goals.

attach_frames = view.get_frame_field_schema() is not None
full_lookup = attach_frames and (_has_frame_filtering(filters) or stages)

# Only return the first frame of each video sample for the grid thumbnail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be a problem for clips views? Or is that handled elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will double check behavior tomorrow am. Ty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup for a clips view looks like this. It works as expected

    {
        '$lookup': {
            'from': 'frames.samples....',
            'let': {
                'sample_id': '$_sample_id',
                'first': {'$arrayElemAt': ['$support', 0]},
                'last': {'$arrayElemAt': ['$support', 1]},
            },
            'pipeline': [
                {
                    '$match': {
                        '$expr': {
                            '$and': [
                                {'$eq': ['$$sample_id', '$_sample_id']},
                                {'$gte': ['$frame_number', '$$first']},
                                {'$lte': ['$frame_number', '$$last']},
                            ],
                        },
                    },
                },
                {'$sort': {'frame_number': 1}},
            ],
            'as': 'frames',
        },
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually limit is disabled for any view currently. Hmm

@benjaminpkane benjaminpkane marked this pull request as draft February 4, 2025 15:18
@benjaminpkane benjaminpkane marked this pull request as ready for review February 6, 2025 15:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
fiftyone/server/samples.py (2)

231-240: Optimize frame filtering check using any().

The function can be more concise while maintaining readability.

 def _has_frame_filtering(filters: t.Optional[JSON]):
     if not filters:
         return False
 
-    for path in filters:
-        if path.startswith(SampleCollection._FRAMES_PREFIX):
-            return True
-
-    return False
+    return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
🧰 Tools
🪛 Ruff (0.8.2)

235-239: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)


242-247: Improve readability of group selection logic.

The boolean expression could be more readable with intermediate variables.

 def _handle_groups(sample_filter: t.Optional[SampleFilter]):
+    has_group = sample_filter and sample_filter.group
+    is_manual_select = has_group and sample_filter.group.id and not sample_filter.group.slices
     return dict(
-        manual_group_select=sample_filter
-        and sample_filter.group
-        and (sample_filter.group.id and not sample_filter.group.slices)
+        manual_group_select=is_manual_select
     )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01fbf7f and 72100a4.

📒 Files selected for processing (1)
  • fiftyone/server/samples.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/server/samples.py

235-239: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-app
  • GitHub Check: build / build
  • GitHub Check: lint / eslint
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (3)
fiftyone/server/samples.py (3)

105-107: Clean refactoring of pipeline construction logic!

Good job extracting the pipeline construction into a separate function, improving code modularity.


185-206: Well-structured pipeline orchestration!

The function cleanly coordinates frame and group handling while maintaining the original view context.


208-229: Verify clips view behavior with frame limiting.

The implementation looks good and achieves the PR objective. However, based on the previous discussion in PR comments about clips views, we should verify that the frame limiting doesn't affect clips view functionality.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Search for clips view related code to verify frame limiting impact

# Search for clips view pipeline construction
ast-grep --pattern 'clips_view = $$$'

# Search for frame limiting in clips context
rg -A 5 'clips.*frame'

Length of output: 19845


Clips view behavior validated – frame limiting remains confined to grid thumbnails.

The tests and documentation references (e.g., in tests/unittests/video_tests.py and docs examples) confirm that clips views (generated via to_clips) continue to work as expected. The frame limiting logic in _handle_frames (which sets limit_frames to 1 only when necessary) does not intrude on the clips view functionality.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/unittests/server_samples_tests.py (3)

28-54: Consider adding edge cases to strengthen test coverage.

While the current test cases cover the main scenarios, consider adding:

  1. Empty dataset test
  2. Multiple video samples test
  3. Test with invalid/malformed filters
  4. Test with frame numbers beyond video length

Would you like me to help generate these additional test cases?


57-76: LGTM! Well-structured MongoDB lookup pipeline.

The function correctly constructs the lookup pipeline with proper MongoDB operators and optional frame limiting.

Consider adding docstring to explain:

  • Purpose of the function
  • Parameters, especially the limit parameter
  • Return value structure
 def _get_lookup_pipeline(dataset: fo.Dataset, limit=False):
+    """Constructs a MongoDB lookup pipeline for frame retrieval.
+
+    Args:
+        dataset: The FiftyOne dataset
+        limit: If True, limits the frame lookup to first frame
+
+    Returns:
+        list: MongoDB pipeline stages for frame lookup
+    """

78-82: LGTM! Clear and focused frame slicing pipeline.

The function provides a simple and effective way to slice the frames array to get only the first frame.

Consider adding docstring:

 def _get_slice_frames_pipeline():
+    """Constructs a MongoDB pipeline stage to slice frames array.
+
+    Returns:
+        list: MongoDB pipeline stage that slices frames array to first frame
+    """
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72100a4 and 6afb1a0.

📒 Files selected for processing (1)
  • tests/unittests/server_samples_tests.py (1 hunks)
🔇 Additional comments (1)
tests/unittests/server_samples_tests.py (1)

20-26: LGTM! Clear and appropriate test sample setup.

The video sample is properly configured with frame-level classifications, providing a good base for testing different filtering scenarios.

import fiftyone as fo
from fiftyone.server.samples import get_samples_pipeline

from decorators import drop_async_dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the functionality of the drop_async_dataset decorator.

The test method uses a custom decorator @drop_async_dataset whose implementation is not visible. Please ensure that this decorator properly handles the async dataset cleanup.

Let's check the decorator's implementation:

Also applies to: 18-19


🏁 Script executed:

#!/bin/bash
# Search for the decorator implementation
rg -A 10 "drop_async_dataset" --type py

Length of output: 12636


Action Required: Ensure Consistent Async Dataset Cleanup

The current implementation of the drop_async_dataset decorator in tests/unittests/decorators.py creates a dataset and only deletes it if an exception is raised. This means that when a test passes without throwing an exception, the dataset is not dropped, which contradicts the intended behavior suggested by the decorator’s name and docstring.

Issues to Address:

  • In the wrapper function, the deletion of the dataset is performed only inside the exception block.
  • To safely clean up the dataset regardless of the test outcome, consider using a finally clause to always call dataset.delete().
  • Review the decorator’s usage to ensure that tests relying on a clean-up of the dataset maintain the expected isolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants