-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit frames lookup pipeline for grid #5450
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughThis update introduces a new parameter to control the number of frames processed in dataset operations. The Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant DS as Dataset/_pipeline
participant Attach as _attach_frames_pipeline
participant View as DatasetView/_pipeline
Client->>DS: Call _pipeline(limit_frames=value)
DS->>Attach: Call _attach_frames_pipeline(limit=value)
Attach-->>DS: Return modified pipeline
DS-->>Client: Return aggregated pipeline result
sequenceDiagram
participant API as API Endpoint
participant Server as Server Samples
participant Helpers as Pipeline Helper Functions
participant Tests as Unit Tests
API->>Server: Request paginated samples
Server->>Helpers: Call get_samples_pipeline (with filters)
Helpers-->>Server: Return constructed lookup & frame pipelines
Server-->>API: Return results
Tests->>Server: Execute various filtering scenarios
Server-->>Tests: Validate output
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
It does get the frames if you hover over the sample to play it. Is that a separate query I'm guessing? |
Yeah, playback frames are streamed from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unittests/server_samples_tests.py (2)
17-54
: Test coverage looks good but could be enhanced.The test cases cover three important scenarios:
- No filters
- Sample-level filters
- Frame-level filters
However, consider adding edge cases:
- Empty video samples
- Videos with single frame
- Videos with multiple frames but no labels
56-80
: Helper functions are well-structured but could be more robust.The pipeline construction helpers are clear and focused. However, consider adding validation for edge cases:
- Negative limit values
- Empty input arrays
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
fiftyone/core/dataset.py
(4 hunks)fiftyone/core/view.py
(4 hunks)fiftyone/server/samples.py
(2 hunks)tests/unittests/server_samples_tests.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/server/samples.py
220-224: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
instead of for
loop
Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
(SIM110)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (7)
tests/unittests/server_samples_tests.py (1)
1-16
: LGTM!The imports and copyright header are properly structured.
fiftyone/server/samples.py (1)
105-105
: LGTM!The pipeline construction with frame limiting is correctly implemented.
fiftyone/core/view.py (2)
1548-1548
: LGTM!The
limit_frames
parameter is properly added to the method signature and correctly passed through the pipeline.Also applies to: 1715-1715
1649-1651
: LGTM!The frame limiting is correctly implemented in the frame attachment pipeline calls.
Also applies to: 1656-1658
fiftyone/core/dataset.py (3)
7489-7489
: LGTM! Parameter addition aligns with PR objectives.The addition of the
limit_frames
parameter to the_pipeline
method signature is well-placed and will enable controlling the number of frames processed.
7568-7568
: LGTM! Parameter propagation is correct.The addition of the
limit
parameter to_attach_frames_pipeline
properly enables the frame limiting functionality at the pipeline level.
7599-7607
: LGTM! Frame limiting implementation is clean and efficient.The implementation correctly adds a
$limit
stage to the MongoDB pipeline when a limit is provided. This efficiently limits frame processing at the database level rather than in application code.
Undrafting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fiftyone/server/samples.py (2)
221-230
: Optimize frame filtering check usingany()
.The function can be made more concise and efficient.
def _has_frame_filtering(filters: t.Optional[JSON]): if not filters: return False - - for path in filters: - if path.startswith(SampleCollection._FRAMES_PREFIX): - return True - - return False + return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)🧰 Tools
🪛 Ruff (0.8.2)
225-229: Use
return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
instead offor
loopReplace with
return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
(SIM110)
232-237
: Improve readability of boolean expression.The group selection logic would be clearer with parentheses to explicitly show operator precedence.
def _handle_groups(sample_filter: t.Optional[SampleFilter]): return dict( - manual_group_select=sample_filter - and sample_filter.group - and (sample_filter.group.id and not sample_filter.group.slices) + manual_group_select=( + sample_filter + and sample_filter.group + and (sample_filter.group.id and not sample_filter.group.slices) + ) )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/server/samples.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/server/samples.py
225-229: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
instead of for
loop
Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
(SIM110)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-app
- GitHub Check: build / build
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (3)
fiftyone/server/samples.py (3)
105-105
: LGTM! Good refactoring.The extraction of pipeline construction logic into a separate function improves code organization and readability.
183-200
: Well-structured pipeline orchestration!The function effectively coordinates frame and group handling with clear separation of concerns and type hints.
202-219
: Excellent implementation of frame limiting!The function intelligently limits frame lookup to just one frame for grid thumbnails while preserving full lookup capability when needed (for filtering or custom stages). This aligns perfectly with the PR's optimization goals.
attach_frames = view.get_frame_field_schema() is not None | ||
full_lookup = attach_frames and (_has_frame_filtering(filters) or stages) | ||
|
||
# Only return the first frame of each video sample for the grid thumbnail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be a problem for clips views? Or is that handled elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will double check behavior tomorrow am. Ty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lookup for a clips view looks like this. It works as expected
{
'$lookup': {
'from': 'frames.samples....',
'let': {
'sample_id': '$_sample_id',
'first': {'$arrayElemAt': ['$support', 0]},
'last': {'$arrayElemAt': ['$support', 1]},
},
'pipeline': [
{
'$match': {
'$expr': {
'$and': [
{'$eq': ['$$sample_id', '$_sample_id']},
{'$gte': ['$frame_number', '$$first']},
{'$lte': ['$frame_number', '$$last']},
],
},
},
},
{'$sort': {'frame_number': 1}},
],
'as': 'frames',
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually limit
is disabled for any view currently. Hmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fiftyone/server/samples.py (2)
231-240
: Optimize frame filtering check usingany()
.The function can be more concise while maintaining readability.
def _has_frame_filtering(filters: t.Optional[JSON]): if not filters: return False - for path in filters: - if path.startswith(SampleCollection._FRAMES_PREFIX): - return True - - return False + return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)🧰 Tools
🪛 Ruff (0.8.2)
235-239: Use
return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
instead offor
loopReplace with
return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
(SIM110)
242-247
: Improve readability of group selection logic.The boolean expression could be more readable with intermediate variables.
def _handle_groups(sample_filter: t.Optional[SampleFilter]): + has_group = sample_filter and sample_filter.group + is_manual_select = has_group and sample_filter.group.id and not sample_filter.group.slices return dict( - manual_group_select=sample_filter - and sample_filter.group - and (sample_filter.group.id and not sample_filter.group.slices) + manual_group_select=is_manual_select )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/server/samples.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/server/samples.py
235-239: Use return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
instead of for
loop
Replace with return any(path.startswith(SampleCollection._FRAMES_PREFIX) for path in filters)
(SIM110)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-app
- GitHub Check: build / build
- GitHub Check: lint / eslint
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (3)
fiftyone/server/samples.py (3)
105-107
: Clean refactoring of pipeline construction logic!Good job extracting the pipeline construction into a separate function, improving code modularity.
185-206
: Well-structured pipeline orchestration!The function cleanly coordinates frame and group handling while maintaining the original view context.
208-229
: Verify clips view behavior with frame limiting.The implementation looks good and achieves the PR objective. However, based on the previous discussion in PR comments about clips views, we should verify that the frame limiting doesn't affect clips view functionality.
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Search for clips view related code to verify frame limiting impact # Search for clips view pipeline construction ast-grep --pattern 'clips_view = $$$' # Search for frame limiting in clips context rg -A 5 'clips.*frame'Length of output: 19845
Clips view behavior validated – frame limiting remains confined to grid thumbnails.
The tests and documentation references (e.g., in
tests/unittests/video_tests.py
and docs examples) confirm that clips views (generated viato_clips
) continue to work as expected. The frame limiting logic in_handle_frames
(which setslimit_frames
to 1 only when necessary) does not intrude on the clips view functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/unittests/server_samples_tests.py (3)
28-54
: Consider adding edge cases to strengthen test coverage.While the current test cases cover the main scenarios, consider adding:
- Empty dataset test
- Multiple video samples test
- Test with invalid/malformed filters
- Test with frame numbers beyond video length
Would you like me to help generate these additional test cases?
57-76
: LGTM! Well-structured MongoDB lookup pipeline.The function correctly constructs the lookup pipeline with proper MongoDB operators and optional frame limiting.
Consider adding docstring to explain:
- Purpose of the function
- Parameters, especially the
limit
parameter- Return value structure
def _get_lookup_pipeline(dataset: fo.Dataset, limit=False): + """Constructs a MongoDB lookup pipeline for frame retrieval. + + Args: + dataset: The FiftyOne dataset + limit: If True, limits the frame lookup to first frame + + Returns: + list: MongoDB pipeline stages for frame lookup + """
78-82
: LGTM! Clear and focused frame slicing pipeline.The function provides a simple and effective way to slice the frames array to get only the first frame.
Consider adding docstring:
def _get_slice_frames_pipeline(): + """Constructs a MongoDB pipeline stage to slice frames array. + + Returns: + list: MongoDB pipeline stage that slices frames array to first frame + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/unittests/server_samples_tests.py
(1 hunks)
🔇 Additional comments (1)
tests/unittests/server_samples_tests.py (1)
20-26
: LGTM! Clear and appropriate test sample setup.The video sample is properly configured with frame-level classifications, providing a good base for testing different filtering scenarios.
import fiftyone as fo | ||
from fiftyone.server.samples import get_samples_pipeline | ||
|
||
from decorators import drop_async_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify the functionality of the drop_async_dataset
decorator.
The test method uses a custom decorator @drop_async_dataset
whose implementation is not visible. Please ensure that this decorator properly handles the async dataset cleanup.
Let's check the decorator's implementation:
Also applies to: 18-19
🏁 Script executed:
#!/bin/bash
# Search for the decorator implementation
rg -A 10 "drop_async_dataset" --type py
Length of output: 12636
Action Required: Ensure Consistent Async Dataset Cleanup
The current implementation of the drop_async_dataset
decorator in tests/unittests/decorators.py
creates a dataset and only deletes it if an exception is raised. This means that when a test passes without throwing an exception, the dataset is not dropped, which contradicts the intended behavior suggested by the decorator’s name and docstring.
Issues to Address:
- In the
wrapper
function, the deletion of the dataset is performed only inside the exception block. - To safely clean up the dataset regardless of the test outcome, consider using a
finally
clause to always calldataset.delete()
. - Review the decorator’s usage to ensure that tests relying on a clean-up of the dataset maintain the expected isolation.
What changes are proposed in this pull request?
Limit the lookup pipeline for frames in grid samples request to one, as only the first frame is needed
How is this patch tested? If it is not, please explain why.
Server tests
Release Notes
What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit
New Features
Refactor
Tests