Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct frame collection aggregations for expanded sample view #5386

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from

Conversation

benjaminpkane
Copy link
Contributor

@benjaminpkane benjaminpkane commented Jan 14, 2025

What changes are proposed in this pull request?

Circumvents MongoDB lookup limitations by adding an optimize_frames pathway to Aggregation pipelines for large video samples in the modal sidebar

How is this patch tested? If it is not, please explain why.

Aggregation tests

Release Notes

  • Optimized sidebar frame label filtering in the expanded view for large videos

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • New Features
    • Enhanced overall performance for processing video and frame-level data with optimized aggregation and view handling.
    • Introduced asynchronous pipelines and improved multi-collection querying, streamlining data operations.
    • Extended filtering options to support both sample and frame-level labels.
  • Tests
    • Expanded test coverage to ensure the reliability of new aggregation and filtering optimizations.

@benjaminpkane benjaminpkane added bug Bug fixes enhancement Code enhancement server Issues related to server features or changes labels Jan 14, 2025
@benjaminpkane benjaminpkane self-assigned this Jan 14, 2025
Copy link
Contributor

coderabbitai bot commented Jan 14, 2025

Walkthrough

The changes update the React import in the index file and enhance the ErrorBoundary component with additional tracking and cleanup logic. Extensive modifications were made across the aggregation and pipeline modules, introducing an optimize_frames parameter to various methods in core, dataset, ODM, stages, view, and server files. These updates include new asynchronous aggregation support, multi-collection handling, and frame-level filtering improvements. Additionally, related tests were extended with new assertions to validate the optimized aggregation behavior.

Changes

Files Change Summary
app/.../index.tsx, app/.../ErrorBoundary/ErrorBoundary.tsx Changed React import from type-only to standard; added useTrackEvent import and updated effect dependencies to improve cleanup in error handling.
fiftyone/core/{aggregations.py, collections.py, dataset.py, odm/database.py, stages.py, view.py}, fiftyone/server/{aggregations.py, routes/values.py, view.py} Introduced the optimize_frames parameter to various aggregation, pipeline, and view methods; added async aggregation support, multi-collection handling, and frame-level filtering adjustments along with necessary signature updates.
tests/unittests/aggregation_tests.py Added new assertions for aggregate functions with the optimize_frames option, covering counts, distinct values, min/max, and NaN/infinite handling.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client Request
    participant V as DatasetView / API Endpoint
    participant SC as SampleCollection
    participant DB as Database Aggregator
    C->>V: Request aggregation with optimize_frames=true
    V->>SC: Build aggregation pipeline (optimize_frames=true)
    SC->>SC: Evaluate optimize_frames flag and build optimized pipeline
    SC->>DB: Execute optimized aggregation pipeline
    DB-->>SC: Return aggregation results
    SC-->>V: Pass results to view
    V-->>C: Respond with aggregation data
Loading

Suggested labels

enhancement

Suggested reviewers

  • brimoor
  • minhtuev

Poem

I'm a rabbit hopping with cheer,
Noticing changes both far and near.
Code pathways now streamlined and neat,
Frame optimizations make our work a treat.
With a twitch of my nose and a skip in my beat,
I celebrate this pull request so sweet! 🐇💕

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@benjaminpkane
Copy link
Contributor Author

On hold to complete #5214

@benjaminpkane benjaminpkane removed the enhancement Code enhancement label Feb 4, 2025
@benjaminpkane benjaminpkane marked this pull request as ready for review February 6, 2025 23:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (12)
fiftyone/server/aggregations.py (1)

153-167: Ambiguous variable name ‘l’.
Renaming it would improve clarity, e.g. “hidden_label” or “hl”.

-                for l in form.hidden_labels
+                for hidden_label in form.hidden_labels
🧰 Tools
🪛 Ruff (0.8.2)

162-162: Ambiguous variable name: l

(E741)

fiftyone/core/stages.py (1)

1421-1430: Consider adding validation for _frames parameter.

The _frames parameter is stored but not validated. Consider adding validation to ensure it's a boolean value.

def __init__(
    self,
    labels=None,
    ids=None,
    tags=None,
    fields=None,
    omit_empty=True,
    _frames=False,
):
+   if not isinstance(_frames, bool):
+       raise ValueError("_frames must be a boolean value")
    self._frames = _frames
fiftyone/core/collections.py (6)

10059-10073: Add docstring for the new optimize_frames parameter

The optimize_frames parameter has been added to the aggregate method but lacks documentation in the docstring. Add documentation to explain its purpose and usage.

 def aggregate(self, aggregations, optimize_frames=False):
     """Aggregates one or more
     :class:`fiftyone.core.aggregations.Aggregation` instances.

     Note that it is best practice to group aggregations into a single call
     to :meth:`aggregate`, as this will be more efficient than performing
     multiple aggregations in series.

     Args:
         aggregations: an :class:`fiftyone.core.aggregations.Aggregation` or
             iterable of :class:`fiftyone.core.aggregations.Aggregation`
             instances
+        optimize_frames (False): whether to optimize the pipeline for a direct
+            frame collection call, when possible

     Returns:
         an aggregation result or list of aggregation results corresponding
         to the input aggregation(s)
     """

10155-10208: Add docstring for the new async method

The newly added _async_aggregate method lacks a docstring explaining its purpose, parameters, and return value.

 async def _async_aggregate(self, aggregations, optimize_frames=False):
+    """Asynchronously aggregates one or more Aggregation instances.
+
+    Args:
+        aggregations: an Aggregation or iterable of Aggregation instances
+        optimize_frames (False): whether to optimize the pipeline for a direct
+            frame collection call, when possible
+
+    Returns:
+        an aggregation result or list of aggregation results corresponding
+        to the input aggregation(s)
+    """

10258-10270: Improve error handling in _build_big_pipeline

The method should validate the aggregation parameter and handle potential errors when building the pipeline.

 def _build_big_pipeline(self, aggregation, optimize_frames=False):
+    if not isinstance(aggregation, foa.Aggregation):
+        raise TypeError(
+            "Expected aggregation of type %s; found %s"
+            % (foa.Aggregation, type(aggregation))
+        )
+
     pipeline, is_frames = self._resolve_aggregation(
         aggregation, big_field="values", optimize_frames=optimize_frames
     )
     return (
         self._pipeline(
             pipeline=pipeline,
             attach_frames=aggregation._needs_frames(self),
             group_slices=aggregation._needs_group_slices(self),
             optimize_frames=is_frames,
         ),
         is_frames,
     )

10344-10367: Add validation for collection types in _resolve_collections

The method should validate that the collections are compatible with the requested operations.

 def _resolve_collections(
     self,
     collections,
     idx_map,
     facet_pipelines,
     pipelines,
     is_frames,
     sync=True,
 ):
+    if not collections:
+        raise ValueError("No collections provided")
+
+    if not isinstance(is_frames, dict):
+        raise TypeError(
+            "Expected is_frames to be a dict; found %s" % type(is_frames)
+        )
+
     dataset = self._dataset
     conn = foo.get_db_conn() if sync else foo.get_async_db_conn()
     frame_coll = (
         conn[dataset._frame_collection_name]
         if dataset._frame_collection_name
         else None
     )
     sample_coll = conn[dataset._sample_collection_name]

10155-10208: Consider adding retry logic for async operations

The async aggregation methods could benefit from retry logic to handle transient MongoDB connection issues.

async def _async_aggregate_with_retry(self, aggregations, optimize_frames=False, max_retries=3):
    """Asynchronously aggregates with retry logic for transient failures."""
    for attempt in range(max_retries):
        try:
            return await self._async_aggregate(aggregations, optimize_frames=optimize_frames)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            await asyncio.sleep(1 * (attempt + 1))  # Exponential backoff

Also applies to: 10331-10343


10059-10073: Enhance frame optimization architecture

The frame optimization feature is well-implemented but could benefit from a more cohesive architecture:

  1. Consider creating a dedicated class to handle frame optimization logic
  2. Add metrics/telemetry to measure optimization effectiveness
  3. Consider caching optimization decisions for similar aggregation patterns

Also applies to: 10155-10208, 10258-10270, 10331-10343, 10344-10367

fiftyone/server/routes/values.py (1)

51-51: Consistent optimization flag usage.

The optimize_frames parameter is correctly set based on sample_id presence and consistently passed through the aggregation pipeline.

Consider documenting the performance implications of this optimization in the function's docstring.

Also applies to: 73-73

app/packages/components/src/components/ErrorBoundary/ErrorBoundary.tsx (1)

35-38: Improved modal cleanup.

The useLayoutEffect hook now properly cleans up modal state and includes the necessary clearModal dependency.

Consider extracting the modal ID to a constant to avoid magic strings:

+const MODAL_ID = "modal";
 useLayoutEffect(() => {
   clearModal();
-  document.getElementById("modal")?.classList.remove("modalon");
+  document.getElementById(MODAL_ID)?.classList.remove("modalon");
 }, [clearModal]);
fiftyone/core/odm/database.py (1)

316-333: Update docstring to better reflect the collections parameter.

The docstring should be updated to better explain how the collections parameter works with multiple pipelines.

-    Args:
-        collections: a single ``pymongo.collection.Collection`` or
-            ``motor.motor_asyncio.AsyncIOMotorCollection``, or an iterable of
+    Args:
+        collections: a single ``pymongo.collection.Collection`` or
+            ``motor.motor_asyncio.AsyncIOMotorCollection``, or an iterable of
+            collections matching the length of pipelines. When multiple collections
+            are provided, each pipeline is executed against its corresponding collection.
fiftyone/core/dataset.py (1)

7496-7538: Consider adding parameter documentation

While the implementation is correct, the optimize_frames parameter is missing from the method's docstring. Consider adding documentation to explain its purpose and behavior.

 def _pipeline(
     self,
     pipeline=None,
     media_type=None,
     attach_frames=False,
     detach_frames=False,
     frames_only=False,
     support=None,
     group_slice=None,
     group_slices=None,
     detach_groups=False,
     groups_only=False,
     manual_group_select=False,
     optimize_frames=False,
     post_pipeline=None,
 ):
+    """Builds an aggregation pipeline for this dataset.
+
+    Args:
+        optimize_frames (False): whether to optimize frame processing by
+            disabling frame attachment, detachment and frame-only operations.
+            This can improve performance for large video samples.
+    """
🧰 Tools
🪛 Ruff (0.8.2)

7517-7517: Avoid equality comparisons to False; use if not attach_frames: for false checks

Replace with not attach_frames

(E712)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1a2d5 and cf33581.

📒 Files selected for processing (12)
  • app/packages/app/src/index.tsx (2 hunks)
  • app/packages/components/src/components/ErrorBoundary/ErrorBoundary.tsx (2 hunks)
  • fiftyone/core/aggregations.py (39 hunks)
  • fiftyone/core/collections.py (15 hunks)
  • fiftyone/core/dataset.py (2 hunks)
  • fiftyone/core/odm/database.py (4 hunks)
  • fiftyone/core/stages.py (7 hunks)
  • fiftyone/core/view.py (7 hunks)
  • fiftyone/server/aggregations.py (6 hunks)
  • fiftyone/server/routes/values.py (2 hunks)
  • fiftyone/server/view.py (12 hunks)
  • tests/unittests/aggregation_tests.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{ts,tsx}`: Review the Typescript and React code for co...

**/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

  • app/packages/components/src/components/ErrorBoundary/ErrorBoundary.tsx
  • app/packages/app/src/index.tsx
🪛 Ruff (0.8.2)
fiftyone/server/aggregations.py

162-162: Ambiguous variable name: l

(E741)

fiftyone/core/collections.py

22-22: asyncio imported but unused

Remove unused import: asyncio

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (113)
fiftyone/server/aggregations.py (10)

19-19: New import for pipeline stages.
This import cleanly introduces the needed module reference.


31-37: Defining allowed stages for frame optimization.
good-to-have check whether any additional stages might benefit from this set in the future.


130-135: Opt-in for frame optimization.
Assigning and passing the optimize_frames flag is consistent, ensuring other functions can honor the optimization hint.


142-145: Forwarding frame optimization to the viewer.
Ensure all code paths calling make_optimized_select_view properly handle an empty list of sample IDs.


146-150: Group slice adaptation with optimize_frames.
Good extension for mixed group contexts. The updated call indicates careful integration with existing group logic.


174-176: Passing optimize_frames to _async_aggregate.
Harmonizes with the aggregator’s overall logic for potential performance gains.


213-218: New class for counting existence.
Looks well-structured, succinct, and consistent with naming in the codebase.


220-237: Implementing _count_expanded_fields.
Counts fields accurately, including nested subfields. The loop properly unwraps list fields.


239-259: Extended signature to accommodate optimization.
Parameter passing remains consistent, ensuring form inputs are captured thoroughly.


354-373: Robust gating for frame optimization.
This function carefully checks sample IDs, group context, and media type before finalizing optimization.

fiftyone/server/view.py (18)

84-84: Added optimize_frames default param.
Providing a default of False ensures backward compatibility.


103-103: Docstring updated for optimize_frames.
Clear documentation fosters maintainability.


133-135: Selective view creation with optimize_frames.
Ensures minimal overhead when selecting a single sample. Consider verifying multi-ID usage.


144-144: Propagating optimize_frames to get_extended_view.
Consistent with prior usage. This helps unify stage handling.


161-161: New parameter in get_view signature.
No immediate issues. Keep an eye on branching complexity as more parameters gather.


171-171: Extended docstring for pipeline optimization.
Increasing clarity about new parameter is helpful for future maintainers.


203-205: Applying optimize_frames in _make_match_stage.
Good integration with existing matching logic. Watch for potential side effects on frames.


210-214: Extending field filter stages with optimize_frames.
Ensures field-level filtering remains consistent with frame-based optimization.


215-219: Extending label filter stages with optimize_frames.
Nicely parallels the prior field-based logic.


373-373: Function signature for _make_match_stage.
Adding the param clarifies usage and maintains parity with other methods.


376-376: Iterating paths.
No direct issues. This line complements the new parameter.


392-393: Stripping "frames." prefix when optimize_frames is True.
Logical approach to transform paths for direct frame references.


401-401: _make_field_filter_stages signature updated.
Naming remains consistent with parallel match stage changes.


403-403: Handling frames in field filter setup.
Trimming "frames." from set_field is correct. Confirm no collisions with top-level fields.

Also applies to: 417-418


425-426: Filtering expression assignment.
Ensuring the final expression is stored in a local variable.


432-432: Label filter stage’s new signature.
Mirrors the field filter approach.


434-436: Iterating label paths with frame optimization.
Consistent with naming and usage in _iter_paths.


472-472: Matching frames in label filters.
Using _frames param gives consistent coverage for frame-based labels.

fiftyone/core/view.py (5)

8-8: Minor change near the file header.
No functional impact seen.


155-155: Optional optimize_frames in _pipeline.
This addition is consistent with the other references to pipeline logic.


1642-1645: Disabling frame attachment when optimize_frames is True.
Effectively bypasses normal frame lookups, presumably optimizing direct frame queries.


1946-1946: Added optimize_frames param to make_optimized_select_view.
Ensures synergy across higher-level calls.


1994-1997: Conditional logic for optimize_frames in selection.
Switching to match on _sample_id is a valid approach for single-sample optimization. Confirm correct results if multiple sample_ids are passed.

fiftyone/core/aggregations.py (56)

8-8: No functional changes detected at line 8.
No issues observed here; this appears to be a minor, possibly whitespace-related adjustment.


133-133: Added “optimize_frames” parameter to base Aggregation.to_mongo().
This new parameter will allow specialized handling for frame-level optimizations. The method signature addition looks sound.


141-142: Docstring updated to mention the new “optimize_frames” parameter.
The documentation clearly explains the new parameter’s purpose. This is a welcome addition.


404-412: Introduced frame optimization in Bounds.to_mongo().
• The method now returns (pipeline, is_frame_field) when optimize_frames is True.
• The overall logic seems consistent with the base class approach.
• Recommended additional check: Ensure that external invocations expecting only pipeline gracefully handle or ignore the tuple return structure.


581-581: Count.to_mongo signature changed to accept “optimize_frames”.
No immediate concerns with the parameter signature itself.


583-584: Short-circuit for empty field_or_expr with optimize_frames check.
Implementation appears straightforward; it conditionally returns a tuple for frames. No issues found.


586-586: Branch for optimize_frames returns a tuple.
Matches the logic in other aggregations. Repeated pattern, no concerns.


588-589: Parsing additional arguments in _parse_field_and_expr.
Ensures the aggregator can process frames if optimize_frames is enabled. This is consistent with the new logic.


595-596: Carrying optimize_frames through other aggregator logic.
Preserves the pattern of returning (pipeline, is_frame_field). Implementation is consistent.


603-605: Final return adjusts to return tuple if optimize_frames is True.
This matches the convention elsewhere in the code. Looks good.


770-778: CountValues.to_mongo updated to accept and handle “optimize_frames”.
• The changes follow the same pattern: a conditional return of (pipeline, is_frame_field).
• Make sure calling code properly handles the tuple return if frames are optimized.


784-785: Minor pipeline assembly for _first is None scenario.
No logic issues noted. The approach is consistent with prior usage.


798-798: Transition to an additional pipeline step.
No errors in syntax or structure. This extends logic for the aggregator’s grouping.


799-807: Building grouped result in CountValues with boundary checks.
Implementation systematically sets up the pipeline to track counts and results. Code is clear and consistent.


808-810: Returning (pipeline, is_frame_field) when optimize_frames is True.
Mirrors the same approach from other aggregations. Straightforward.


812-812: Starting match logic for “_selected” filtering block.
The single-line change is minor, no issues.


813-834: Handling search and selected sets in CountValues aggregator.
• Merges multiple conditions into a $match with $expr.
• Additional sorting/limit pipeline steps are introduced.
• The chunk aligns well with the aggregator’s design.
• Only caution: ensure no performance regressions when dealing with large sets.


838-838: Adding a new group stage for final result retrieval.
This is consistent with the aggregator’s final output. No issues found.


852-854: Refined sorts and limit for retrieval.
Implementation is correct for advanced filtering. Matches the aggregator’s conventions.


866-867: Conditional return of pipeline, is_frame_field in CountValues.
Same “optimize_frames” pattern as other methods. Good consistency.


990-990: Distinct.to_mongo signature receives optimize_frames param.
No direct concerns, logic remains parallel to other aggregations.


991-998: Distinct aggregator pipeline logic updated.
• The pipeline returns a tuple if optimize_frames is True.
• Code structure is standard: $match, $group, $unwind, $sort, $group.
• Implementation is consistent, no errors detected.


1004-1005: Return statement reflects optimize_frames conditions in Distinct.
Maintains the code’s uniform approach. Looks good.


1022-1025: Completes pipeline with final grouping in Distinct aggregator.
No issues. Flow remains consistent.


1172-1178: FacetAggregations.to_mongo gains optimize_frames awareness.
• The code runs _parse_field_name with the new parameter.
• Implementation uses a facet-based aggregator.
• The approach is consistent with prior aggregator modifications.


1183-1192: FacetAggregations building sub-aggregation pipelines with optimize_frames.
• If frames are targeted, we retrieve (pipeline, is_frame_field) from each aggregator.
• Properly merges sub-pipelines into a single $facet stage.
• Overall design fits the newly introduced parameter.


1195-1198: Returning final pipeline with or without is_frame_field.
No issues with the final conditional logic. This preserves consistent usage across the codebase.


1255-1256: HistogramValues constructor argument handling for safe/optimize_frames.
No structural concerns noted. The additional parameter is in line with the aggregator’s intended design.


1405-1406: HistogramValues.to_mongo signature updated.
Method includes “optimize_frames”, consistent with the aggregator pattern.


1407-1413: Pipeline preparation for histogram with possible frame fields.
Checks are made for toString conversions, sets up $bucket or $bucketAuto. Implementation is correct.


1418-1419: Conditional return of (pipeline, is_frame_field) in histogram aggregator.
Mimics the new aggregator pattern. No issues.


1459-1465: Appending final grouping and returning pipeline, potentially with is_frame_field.
No concerns. The aggregator’s logic is complete.


1503-1510: Handling None bounds in histogram aggregator.
The function ensures default [-1, -1] if no data is present. Continues to align with safe aggregator usage.


1553-1553: No direct change besides possibly whitespace.
No logic differences found here.


1660-1667: Min.to_mongo updated for optimize_frames.
• The aggregator returns a tuple if frames are to be optimized.
• Pipeline logic remains straightforward, aggregating via $min.


1673-1674: Handling the final pipeline return with (pipeline, is_frame_field).
Kept consistent with aggregator changes. No issue.


1685-1688: Max.to_mongo introduces the optimize_frames conditional.
Same pattern: if optimize_frames is True, return the tuple. Implementation is consistent.


1793-1800: Max aggregator pipeline and tuple-based return aligned with new logic.
No concerns, structure matches previous aggregator updates.


1807-1808: Completes the method with a conditional return.
All good here, uniform usage of optimize_frames.


1819-1820: Return block for Max aggregator ensures consistent approach.
Straightforward continuation of the pattern.


1917-1925: Mean.to_mongo gains optimize_frames logic.
• Introduces the optional tuple return.
• The aggregator uses $avg for final computation.
• Implementation is consistent with other “to_mongo” changes.


1931-1932: Returning (pipeline, is_frame_field) if frames are optimized.
No issues identified. Matches aggregator design.


1941-1944: Final pipeline stage for Mean aggregator.
Implementation concerns are minimal; logic is consistent with the framework.


2005-2008: Upgraded Quantiles aggregator: new param in signature and doc.
It now can process optimize_frames consistently, no logic issues visible.


2065-2073: Quantiles.to_mongo extended with frame-aware logic.
• The aggregator can exit with (pipeline, is_frame_field).
• Method is aligned with other aggregator changes.


2079-2080: Conditional return for frames in Quantiles aggregator.
Consistent usage, no issues.


2103-2106: Completing pipeline in Quantiles.
No performance or correctness concerns found; the aggregator remains logically consistent.


2124-2124: No functional changes at line 2124.
Likely whitespace or short comment. Nothing to address.


3153-3154: Including “optimize_frames” in _parse_field_and_expr signature.
Critical to ensure aggregator logic can parse frame-related contexts. This is essential for the newly introduced parameter.


3166-3168: Extract prefix from expr if provided, then incorporate optimize_frames.
Approach to unify field name and expression logic is consistent. No issues flagged.


3198-3199: Calls “_make_set_field_pipeline” with optimize_frames.
Ensures the aggregator logic can manipulate the pipeline for frames if needed. Implementation matches prior usage.


3201-3201: Handling pipeline initialization.
No issues. This is a draft to build final aggregator pipeline steps.


3236-3244: Frame-based logic for top-level or sub-level fields.
Ensures that if is_frame_field and not optimize_frames, we handle frames with other fallback approach. The logic is consistent with the rest of the pipeline building.


3241-3242: Adjust path references for “frames” prefix.
Implementation helps manage references to frames at the correct path. Looks correct.


3243-3244: Inject “frames” into other_list_fields.
Ensures correct unwinding or grouping for frames. No contradictions identified.


3282-3289: Return final parsing details (path, pipeline, other_list_fields, etc.).
This includes the new is_frame_field logic, consistent with aggregator expansions.

fiftyone/core/stages.py (10)

1364-1372: Added _frames parameter to enhance frame filtering capabilities.

The addition of the _frames parameter to the ExcludeLabels class constructor enables more granular control over frame-level filtering. This is a good enhancement that aligns with the PR's objective of optimizing frame label filtering.


1397-1398: Store _frames parameter as instance variable.

Good practice to store the parameter as an instance variable for later use in filtering operations.


1466-1469: Enhanced frame field filtering logic with _frames parameter.

The code now uses the _frames parameter to determine the prefix for frame field filtering. This is a good improvement that provides more flexibility in frame handling.


1559-1605: Updated FilterLabels instantiation to include _frames parameter.

The code correctly passes the _frames parameter when creating new FilterLabels instances, maintaining consistency in frame handling throughout the filtering pipeline.

🧰 Tools
🪛 Ruff (0.8.2)

1589-1589: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


2338-2339: Added _frames parameter to FilterLabels class.

The addition of the _frames parameter to the FilterLabels class is consistent with the changes in ExcludeLabels. This maintains a uniform approach to frame handling across related classes.


2344-2345: Store _frames parameter in FilterLabels class.

Good practice to store the parameter as an instance variable, consistent with the pattern used in ExcludeLabels.


2421-2430: Enhanced frame filtering logic with conditional branching.

The code now uses the _frames parameter to determine the appropriate filtering pipeline. This is a good improvement that provides more granular control over frame-level filtering.


2467-2468: Updated prefix handling for frame field filtering.

The code now uses the _frames parameter to determine the appropriate prefix for frame field filtering. This is a good enhancement that provides more flexibility in frame handling.


2520-2526: Robust parameter validation in _validate_params.

The validation logic for the filter parameter is thorough and handles all expected types. Good practice to validate parameters early.


2488-2492: Consider adding unit tests for _needs_group_slices method.

The _needs_group_slices method handles important group slice functionality but appears to lack dedicated unit tests. Consider adding tests to verify its behavior.

fiftyone/core/collections.py (1)

22-22: Remove unused import

The asyncio import is used in the _async_aggregate method, so this is a false positive from the static analyzer.

🧰 Tools
🪛 Ruff (0.8.2)

22-22: asyncio imported but unused

Remove unused import: asyncio

(F401)

app/packages/app/src/index.tsx (1)

4-4: Verify React version compatibility.

The change from type-only import to regular import suggests runtime usage of React. Please ensure this change aligns with the React version in use and its requirements.

✅ Verification successful

🌐 Web query:

What is the latest version of React and its import requirements?

💡 Result:

The latest version of React as of February 2025 is **React 19**, released in April 2024[6][4]. Here are its key import requirements and changes:

### Import Requirements
1. **JSX Transform**
   - No longer need to import `React` for JSX usage due to automatic JSX runtime[2][6]
   - Example component without explicit import:
     ```jsx
     function Button() {
       return <button>Click</button> // Works without "import React"
     }
     ```

2. **Hooks/API Imports**
   - Two valid patterns:
     ```javascript
     // Namespace import (recommended by some)
     import * as React from 'react';
     const [state, setState] = React.useState();
     
     // Named imports (official recommendation)
     import { useState } from 'react';
     const [state, setState] = useState();
     ```

3. **TypeScript Requirements**
   - Must use updated types:
     ```bash
     npm install @types/react@^19.0.0 @types/react-dom@^19.0.0
     ```

### Key Changes from Previous Versions
| Feature               | React 18                          | React 19                          |
|-----------------------|-----------------------------------|-----------------------------------|
| Root API              | `ReactDOM.render()`               | `ReactDOM.createRoot()`[3][9]     |
| Default JSX Transform | Optional                          | Mandatory[6]                     |
| Server Components      | Experimental                      | Stable Feature[1][4]             |
| Async Transitions      | Start with `startTransition`      | Automatic with Actions API[6]    |

To upgrade:
```bash
npm install --save-exact react@^19.0.0 react-dom@^19.0.0
```

Notable improvements include automatic batching improvements[1], compiler optimizations[4], and enhanced developer tools for debugging concurrent features[1][4]. Projects using older patterns like `ReactDOM.render()` will still work but need migration to new root API for full feature access[3][9].

Citations:

- 1: https://dev.to/sovannaro/react-18-vs-react-19-key-differences-to-know-for-2025-1614
- 2: https://www.epicreact.dev/importing-react-through-the-ages
- 3: https://www.w3schools.com/react/react_upgrade.asp
- 4: https://graffersid.com/react-js-trends/
- 5: https://react.dev/learn/importing-and-exporting-components
- 6: https://react.dev/blog/2024/04/25/react-19-upgrade-guide
- 7: https://dev.to/holasoymalva/why-i-decided-to-stop-working-with-reactjs-in-2025-4d1l
- 8: https://create-react-app.dev/docs/importing-a-component/
- 9: https://legacy.reactjs.org/blog/2022/03/08/react-18-upgrade-guide.html
- 10: https://legacy.reactjs.org/versions

React Version Compatibility Confirmed

Based on the latest React 19 release and its import requirements, using a regular (runtime) import in your entry point is acceptable. Note the following:

• React 19 now supports an automatic JSX runtime that eliminates the need to import React explicitly for JSX transformation.
• However, if your application uses React APIs (e.g., hooks) or relies on non-JSX runtime functionality, the explicit import remains valid.
• Ensure that your build configuration (Babel/TypeScript settings, etc.) and your type dependencies (@types/react and @types/react-dom) are updated to version 19 for full compatibility.

fiftyone/server/routes/values.py (1)

48-50: LGTM! Added sample_id to filter constructor.

The addition of sample_id to the SampleFilter constructor improves filtering capabilities.

app/packages/components/src/components/ErrorBoundary/ErrorBoundary.tsx (1)

1-1: LGTM! Added analytics tracking.

The useTrackEvent import enables proper error tracking.

tests/unittests/aggregation_tests.py (4)

111-113: LGTM! Added frame count optimization test.

The test properly validates the frame count with optimization enabled.


175-183: LGTM! Added frame count values optimization tests.

The tests thoroughly validate the count values aggregation with frame optimization for both direct and transformed (upper case) label values.

Also applies to: 190-200


243-249: LGTM! Added distinct frame optimization test.

The test properly validates distinct label aggregation with frame optimization.


309-314: LGTM! Added min/max frame optimization tests.

The tests properly validate min and max confidence aggregations with frame optimization.

Also applies to: 358-363

fiftyone/core/odm/database.py (4)

24-24: LGTM! Import addition is well-placed.

The addition of pymongo.collection import is necessary for type checking in the updated aggregate function and is correctly placed with other pymongo-related imports.


315-357: LGTM! Implementation changes are well-structured.

The changes to support both single and multiple collections are implemented cleanly with proper type checking and consistent handling of both sync and async paths.


360-368: LGTM! Thread pool aggregation is properly updated.

The changes to _do_pooled_aggregate maintain efficient thread pool usage while supporting the new collections parameter.


371-377: LGTM! Async aggregation is properly updated.

The changes to _do_async_pooled_aggregate correctly handle multiple collections while maintaining asynchronous execution patterns.

fiftyone/core/dataset.py (2)

7496-7497: Parameter addition looks good

The new optimize_frames parameter is added with a sensible default value of False.


7534-7538: Optimization logic looks correct

The implementation properly disables frame attachment, detachment and frame-only processing when optimize_frames=True. This optimization will help improve performance for large video samples by avoiding unnecessary frame processing.

fiftyone/core/collections.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes server Issues related to server features or changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant