Limit frames lookup pipeline for grid #5450

benjaminpkane · 2025-01-30T15:30:06Z

What changes are proposed in this pull request?

Limit the lookup pipeline for frames in grid samples request to one, as only the first frame is needed

How is this patch tested? If it is not, please explain why.

Server tests

Release Notes

Fixed sample field filtering on large video samples in the grid

What areas of FiftyOne does this PR affect?

App: FiftyOne application changes
Build: Build and test infrastructure changes
Core: Core fiftyone Python library changes
Documentation: FiftyOne documentation changes
Other

Summary by CodeRabbit

New Features
- Introduced an option to limit the number of frames processed during data operations, giving users improved control over data handling.
Refactor
- Streamlined sample retrieval by modularizing frame and group filtering logic for more consistent and efficient processing.
Tests
- Expanded test coverage to ensure the enhanced frame and sample processing features work reliably.

coderabbitai · 2025-01-30T15:30:12Z

Walkthrough

This update introduces a new parameter to control the number of frames processed in dataset operations. The limit_frames parameter is added to the _pipeline and _attach_frames_pipeline methods within the Dataset and DatasetView classes. Significant refactoring in the server’s sample pagination code modularizes frame and group handling into helper functions. New unit tests ensure that these changes in frame limiting and pipeline construction are correctly validated.

Changes

File(s)	Change Summary
`fiftyone/core/dataset.py` `fiftyone/core/view.py`	Added a new parameter (`limit_frames`/`limit`) to the `_pipeline` and `_attach_frames_pipeline` methods to control frame attachment in data processing.
`fiftyone/server/samples.py`	Refactored `paginate_samples` by extracting frame and group logic into new helper functions: `get_samples_pipeline`, `_handle_frames`, `_has_frame_filtering`, `_handle_groups`.
`tests/unittests/server_samples_tests.py`	Introduced unit tests with a new test class and helper functions (`_get_lookup_pipeline`, `_get_slice_frames_pipeline`) to validate sample pipeline and frame filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant DS as Dataset/_pipeline
    participant Attach as _attach_frames_pipeline
    participant View as DatasetView/_pipeline

    Client->>DS: Call _pipeline(limit_frames=value)
    DS->>Attach: Call _attach_frames_pipeline(limit=value)
    Attach-->>DS: Return modified pipeline
    DS-->>Client: Return aggregated pipeline result

sequenceDiagram
    participant API as API Endpoint
    participant Server as Server Samples
    participant Helpers as Pipeline Helper Functions
    participant Tests as Unit Tests

    API->>Server: Request paginated samples
    Server->>Helpers: Call get_samples_pipeline (with filters)
    Helpers-->>Server: Return constructed lookup & frame pipelines
    Server-->>API: Return results
    Tests->>Server: Execute various filtering scenarios
    Server-->>Tests: Validate output

Possibly related PRs

Optimize dataset first/head and last/tail #5407: The changes in the main PR are related to the modifications in the _attach_frames_pipeline method, which is also updated in the retrieved PR, indicating a direct connection in their functionality.
Omit slice filter for frame collection QP #5207: The changes in the main PR are related to the modifications of the _attach_frames_pipeline method, which is also referenced in the retrieved PR's updates to the lightning_resolver function, indicating a shared focus on frame handling within the dataset processing logic.
Remove potential for misspelling expected Dataset fields #4885: The changes in the main PR are related to the modifications in the _attach_frames_pipeline method within the Dataset class, which are also reflected in the updates to the _attach_frames_pipeline method calls in the DatasetView class in the retrieved PR.

Suggested labels

bug, app

Suggested reviewers

brimoor
sashankaryal

Poem

Hey there, I'm a coding bunny, hopping with delight,
New limits on frames keep our data light.
Pipelines refined, helpers in play,
Tests all pass in a brilliant display.
With carrots and code, we hop into the day!
🥕🐰 Happy coding!

Tip

🌐 Web search-backed reviews and chat

We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
You can disable this feature by setting web_search: false in the knowledge_base settings.
Please share any feedback in the Discord discussion.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

swheaton · 2025-01-30T15:36:01Z

It does get the frames if you hover over the sample to play it. Is that a separate query I'm guessing?

benjaminpkane · 2025-01-30T15:46:35Z

It does get the frames if you hover over the sample to play it. Is that a separate query I'm guessing?

Yeah, playback frames are streamed from /frames

https://github.com/voxel51/fiftyone/blob/develop/fiftyone/server/routes/frames.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/unittests/server_samples_tests.py (2)

17-54: Test coverage looks good but could be enhanced.

The test cases cover three important scenarios:

No filters

Sample-level filters

Frame-level filters

However, consider adding edge cases:

Empty video samples

Videos with single frame

Videos with multiple frames but no labels

56-80: Helper functions are well-structured but could be more robust.

The pipeline construction helpers are clear and focused. However, consider adding validation for edge cases:

Negative limit values

Empty input arrays

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffd7f97 and 1506726.

📒 Files selected for processing (4)

fiftyone/core/dataset.py (4 hunks)
fiftyone/core/view.py (4 hunks)
fiftyone/server/samples.py (2 hunks)
tests/unittests/server_samples_tests.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

fiftyone/server/samples.py

220-224: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (7)

tests/unittests/server_samples_tests.py (1)

1-16: LGTM!

The imports and copyright header are properly structured.

fiftyone/server/samples.py (1)

105-105: LGTM!

The pipeline construction with frame limiting is correctly implemented.

fiftyone/core/view.py (2)

1548-1548: LGTM!

The limit_frames parameter is properly added to the method signature and correctly passed through the pipeline.

Also applies to: 1715-1715

1649-1651: LGTM!

The frame limiting is correctly implemented in the frame attachment pipeline calls.

Also applies to: 1656-1658

fiftyone/core/dataset.py (3)

7489-7489: LGTM! Parameter addition aligns with PR objectives.

The addition of the limit_frames parameter to the _pipeline method signature is well-placed and will enable controlling the number of frames processed.

7568-7568: LGTM! Parameter propagation is correct.

The addition of the limit parameter to _attach_frames_pipeline properly enables the frame limiting functionality at the pipeline level.

7599-7607: LGTM! Frame limiting implementation is clean and efficient.

The implementation correctly adds a $limit stage to the MongoDB pipeline when a limit is provided. This efficiently limits frame processing at the database level rather than in application code.

benjaminpkane · 2025-02-03T15:21:41Z

Undrafting

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

fiftyone/server/samples.py (2)
221-230: Optimize frame filtering check using any().

The function can be made more concise and efficient.
 def _has_frame_filtering(filters: t.Optional[JSON]):
     if not filters:
         return False
-
-    for path in filters:
-        if path.startswith(SampleCollection._FRAMES_PREFIX):
-            return True
-
-    return False
+    return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
🧰 Tools

🪛 Ruff (0.8.2)

225-229: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

232-237: Improve readability of boolean expression.

The group selection logic would be clearer with parentheses to explicitly show operator precedence.
 def _handle_groups(sample_filter: t.Optional[SampleFilter]):
     return dict(
-        manual_group_select=sample_filter
-        and sample_filter.group
-        and (sample_filter.group.id and not sample_filter.group.slices)
+        manual_group_select=(
+            sample_filter
+            and sample_filter.group
+            and (sample_filter.group.id and not sample_filter.group.slices)
+        )
     )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1506726 and 01fbf7f.

📒 Files selected for processing (1)

fiftyone/server/samples.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

fiftyone/server/samples.py

225-229: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-app
GitHub Check: build / build
GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (3)

fiftyone/server/samples.py (3)

105-105: LGTM! Good refactoring.

The extraction of pipeline construction logic into a separate function improves code organization and readability.

183-200: Well-structured pipeline orchestration!

The function effectively coordinates frame and group handling with clear separation of concerns and type hints.

202-219: Excellent implementation of frame limiting!

The function intelligently limits frame lookup to just one frame for grid thumbnails while preserving full lookup capability when needed (for filtering or custom stages). This aligns perfectly with the PR's optimization goals.

kaixi-wang · 2025-02-03T22:47:22Z

fiftyone/server/samples.py

+    attach_frames = view.get_frame_field_schema() is not None
+    full_lookup = attach_frames and (_has_frame_filtering(filters) or stages)
+
+    # Only return the first frame of each video sample for the grid thumbnail


would this be a problem for clips views? Or is that handled elsewhere?

I will double check behavior tomorrow am. Ty

The lookup for a clips view looks like this. It works as expected

{ '$lookup': { 'from': 'frames.samples....', 'let': { 'sample_id': '$_sample_id', 'first': {'$arrayElemAt': ['$support', 0]}, 'last': {'$arrayElemAt': ['$support', 1]}, }, 'pipeline': [ { '$match': { '$expr': { '$and': [ {'$eq': ['$$sample_id', '$_sample_id']}, {'$gte': ['$frame_number', '$$first']}, {'$lte': ['$frame_number', '$$last']}, ], }, }, }, {'$sort': {'frame_number': 1}}, ], 'as': 'frames', }, }

Actually limit is disabled for any view currently. Hmm

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

fiftyone/server/samples.py (2)
231-240: Optimize frame filtering check using any().

The function can be more concise while maintaining readability.
 def _has_frame_filtering(filters: t.Optional[JSON]):
     if not filters:
         return False
 
-    for path in filters:
-        if path.startswith(SampleCollection._FRAMES_PREFIX):
-            return True
-
-    return False
+    return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
🧰 Tools

🪛 Ruff (0.8.2)

235-239: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

242-247: Improve readability of group selection logic.

The boolean expression could be more readable with intermediate variables.
 def _handle_groups(sample_filter: t.Optional[SampleFilter]):
+    has_group = sample_filter and sample_filter.group
+    is_manual_select = has_group and sample_filter.group.id and not sample_filter.group.slices
     return dict(
-        manual_group_select=sample_filter
-        and sample_filter.group
-        and (sample_filter.group.id and not sample_filter.group.slices)
+        manual_group_select=is_manual_select
     )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01fbf7f and 72100a4.

📒 Files selected for processing (1)

fiftyone/server/samples.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

fiftyone/server/samples.py

235-239: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters) instead of for loop

Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)

(SIM110)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test / test-app
GitHub Check: build / build
GitHub Check: lint / eslint
GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (3)

fiftyone/server/samples.py (3)

105-107: Clean refactoring of pipeline construction logic!

Good job extracting the pipeline construction into a separate function, improving code modularity.

185-206: Well-structured pipeline orchestration!

The function cleanly coordinates frame and group handling while maintaining the original view context.

208-229: Verify clips view behavior with frame limiting.

The implementation looks good and achieves the PR objective. However, based on the previous discussion in PR comments about clips views, we should verify that the frame limiting doesn't affect clips view functionality.
✅ Verification successful

🏁 Script executed:
#!/bin/bash
# Description: Search for clips view related code to verify frame limiting impact

# Search for clips view pipeline construction
ast-grep --pattern 'clips_view = $$$'

# Search for frame limiting in clips context
rg -A 5 'clips.*frame'
Length of output: 19845

Clips view behavior validated – frame limiting remains confined to grid thumbnails.

The tests and documentation references (e.g., in tests/unittests/video_tests.py and docs examples) confirm that clips views (generated via to_clips) continue to work as expected. The frame limiting logic in _handle_frames (which sets limit_frames to 1 only when necessary) does not intrude on the clips view functionality.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tests/unittests/server_samples_tests.py (3)
28-54: Consider adding edge cases to strengthen test coverage.

While the current test cases cover the main scenarios, consider adding:

Empty dataset test

Multiple video samples test

Test with invalid/malformed filters

Test with frame numbers beyond video length

Would you like me to help generate these additional test cases?

57-76: LGTM! Well-structured MongoDB lookup pipeline.

The function correctly constructs the lookup pipeline with proper MongoDB operators and optional frame limiting.

Consider adding docstring to explain:

Purpose of the function

Parameters, especially the limit parameter

Return value structure
 def _get_lookup_pipeline(dataset: fo.Dataset, limit=False):
+    """Constructs a MongoDB lookup pipeline for frame retrieval.
+
+    Args:
+        dataset: The FiftyOne dataset
+        limit: If True, limits the frame lookup to first frame
+
+    Returns:
+        list: MongoDB pipeline stages for frame lookup
+    """
78-82: LGTM! Clear and focused frame slicing pipeline.

The function provides a simple and effective way to slice the frames array to get only the first frame.

Consider adding docstring:
 def _get_slice_frames_pipeline():
+    """Constructs a MongoDB pipeline stage to slice frames array.
+
+    Returns:
+        list: MongoDB pipeline stage that slices frames array to first frame
+    """

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72100a4 and 6afb1a0.

📒 Files selected for processing (1)

tests/unittests/server_samples_tests.py (1 hunks)

🔇 Additional comments (1)

tests/unittests/server_samples_tests.py (1)

20-26: LGTM! Clear and appropriate test sample setup.

The video sample is properly configured with frame-level classifications, providing a good base for testing different filtering scenarios.

coderabbitai · 2025-02-06T15:48:06Z

tests/unittests/server_samples_tests.py

+import fiftyone as fo
+from fiftyone.server.samples import get_samples_pipeline
+
+from decorators import drop_async_dataset


💡 Verification agent

🧩 Analysis chain

Verify the functionality of the drop_async_dataset decorator.

The test method uses a custom decorator @drop_async_dataset whose implementation is not visible. Please ensure that this decorator properly handles the async dataset cleanup.

Let's check the decorator's implementation:

Also applies to: 18-19

🏁 Script executed:

#!/bin/bash # Search for the decorator implementation rg -A 10 "drop_async_dataset" --type py

Length of output: 12636

Action Required: Ensure Consistent Async Dataset Cleanup

The current implementation of the drop_async_dataset decorator in tests/unittests/decorators.py creates a dataset and only deletes it if an exception is raised. This means that when a test passes without throwing an exception, the dataset is not dropped, which contradicts the intended behavior suggested by the decorator’s name and docstring.

Issues to Address:

In the wrapper function, the deletion of the dataset is performed only inside the exception block.

To safely clean up the dataset regardless of the test outcome, consider using a finally clause to always call dataset.delete().

Review the decorator’s usage to ensure that tests relying on a clean-up of the dataset maintain the expected isolation.

limit frames for grid

507a080

benjaminpkane added the enhancement Code enhancement label Jan 30, 2025

benjaminpkane self-assigned this Jan 30, 2025

cleanup, add tests

1506726

benjaminpkane marked this pull request as ready for review February 3, 2025 15:06

coderabbitai bot reviewed Feb 3, 2025

View reviewed changes

benjaminpkane added 2 commits February 3, 2025 14:11

Merge branch 'develop' into limit-frames

ccd8fd0

fix e2e

01fbf7f

coderabbitai bot reviewed Feb 3, 2025

View reviewed changes

kaixi-wang reviewed Feb 3, 2025

View reviewed changes

benjaminpkane marked this pull request as draft February 4, 2025 15:18

add clip limit support

72100a4

benjaminpkane marked this pull request as ready for review February 6, 2025 15:11

coderabbitai bot reviewed Feb 6, 2025

View reviewed changes

update test

6afb1a0

coderabbitai bot reviewed Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit frames lookup pipeline for grid #5450

Limit frames lookup pipeline for grid #5450

benjaminpkane commented Jan 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

swheaton commented Jan 30, 2025

benjaminpkane commented Jan 30, 2025 •

edited

Loading

coderabbitai bot left a comment

benjaminpkane commented Feb 3, 2025

coderabbitai bot left a comment

kaixi-wang Feb 3, 2025

benjaminpkane Feb 3, 2025

benjaminpkane Feb 4, 2025

benjaminpkane Feb 4, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Feb 6, 2025

Limit frames lookup pipeline for grid #5450

Are you sure you want to change the base?

Limit frames lookup pipeline for grid #5450

Conversation

benjaminpkane commented Jan 30, 2025 • edited by coderabbitai bot Loading

What changes are proposed in this pull request?

How is this patch tested? If it is not, please explain why.

Release Notes

What areas of FiftyOne does this PR affect?

Summary by CodeRabbit

coderabbitai bot commented Jan 30, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

swheaton commented Jan 30, 2025

benjaminpkane commented Jan 30, 2025 • edited Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

benjaminpkane commented Feb 3, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

kaixi-wang Feb 3, 2025

Choose a reason for hiding this comment

benjaminpkane Feb 3, 2025

Choose a reason for hiding this comment

benjaminpkane Feb 4, 2025

Choose a reason for hiding this comment

benjaminpkane Feb 4, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 6, 2025

Choose a reason for hiding this comment

benjaminpkane commented Jan 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2025 •

edited

Loading

benjaminpkane commented Jan 30, 2025 •

edited

Loading