Main -> develop #5435

brimoor · 2025-01-25T00:42:08Z

Addresses merge conflicts from #5434.

Summary by CodeRabbit

FiftyOne Release Notes

New Features
- Added support for custom metrics in evaluation workflows across multiple domains (classification, detection, segmentation, regression).
- Enhanced evaluation framework to allow user-defined metrics.
- Improved sample retrieval methods for datasets and views.
- Introduced new evaluation metric classes for better customization.
Bug Fixes
- Refined sample collection methods to handle edge cases.
- Improved error handling in evaluation and metric computation processes.
Documentation
- Added sections on custom evaluation metrics and backends in the user guide.
- Updated docstrings and method signatures to reflect new functionality.
Chores
- Refactored import statements and utility functions.
- Removed deprecated test methods.
- Simplified code structure in several modules.

Cherry pick builtin use lists

Add support for custom evaluation metrics

* fix test and make safer * wrap in try finally * use oid from var

This reverts commit d306a36.

Add Model Evaluation panel callbacks for segmentation tasks

Update release-notes for v1.3.0

Release v1.3.0

coderabbitai · 2025-01-25T00:42:18Z

Walkthrough

This pull request introduces comprehensive enhancements to FiftyOne's evaluation and metrics infrastructure across multiple modules. The changes focus on adding support for custom metrics, improving evaluation configurations, and refactoring sample retrieval methods in core dataset and view classes. Key modifications span evaluation utilities, dataset core classes, and various utility modules, with a particular emphasis on providing more flexible and extensible evaluation capabilities.

Changes

File	Change Summary
`app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx`	Added types and function for custom metrics handling
`fiftyone/core/collections.py`, `fiftyone/core/dataset.py`, `fiftyone/core/view.py`	Refactored sample retrieval methods (`first`, `last`, `head`, `tail`, `one`)
`fiftyone/utils/eval/*`	Added `custom_metrics` support across evaluation modules for detection, classification, regression, and segmentation
`fiftyone/operators/evaluation_metric.py`	Introduced base classes for evaluation metrics configuration and methods
`docs/source/release-notes.rst`	Added release notes for version 2.5.0 and 1.3.0
`fiftyone/__public__.py`	Removed import of `SampleView`
`fiftyone/operators/__init__.py`	Added imports for `EvaluationMetricConfig` and `EvaluationMetric`
`fiftyone/server/utils.py`	Introduced `cache_dataset` function for caching datasets
`tests/unittests/dataset_tests.py`	Removed tests related to sample retrieval methods
`tests/unittests/operators/registry_tests.py`	Removed test for invalid operator type handling
`docs/source/user_guide/evaluation.rst`	Added sections on custom evaluation metrics and backends

Possibly related PRs

fix confusion matrix in model evaluation panel #5186: Modifies confusion matrix handling in Evaluation component
Adds correct/incorrect rows to summary card for classification/non-binary evaluations #5388: Adds correct/incorrect prediction rows in classification evaluations
Add Model Evaluation panel callbacks for segmentation tasks #5332: Introduces segmentation task callbacks in model evaluation panel

Suggested labels

documentation

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

fiftyone/utils/eval/detection.py (1)

Line range hint 290-486: Dynamic field usage caution

When registering samples and cleaning up in DetectionEvaluation, multiple dynamic fields (eval_key, tp_field, fn_field, etc.) are added or removed. If your dataset is concurrently accessed in multi-threaded scenarios or by multiple processes, you might encounter race conditions on field creation or deletion. Consider sync mechanisms or appropriate documentation regarding concurrency constraints.

🧹 Nitpick comments (28)

fiftyone/core/dataset.py (2)
Line range hint 1318-1331: Consider using f-string for error message.

The implementation is good, but the error message could be more readable using an f-string.
-                    "Expected one matching sample, but found %d matches"
-                    % len(view)
+                    f"Expected one matching sample, but found {len(view)} matches"
1238-1240: Preserve exception context in error handling.

The bare raise in the except clause loses the original exception context. Consider preserving it for better debugging.
-        except ValueError:
-            raise ValueError("%s is empty" % self.__class__.__name__)
+        except ValueError as err:
+            raise ValueError("%s is empty" % self.__class__.__name__) from err
🧰 Tools

🪛 Ruff (0.8.2)

1239-1239: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
fiftyone/core/collections.py (2)
926-929: Improve exception handling in first()

Consider using raise ... from None to explicitly indicate that this is a new exception rather than a re-raised one.
-            raise ValueError("%s is empty" % self.__class__.__name__)
+            raise ValueError("%s is empty" % self.__class__.__name__) from None
🧰 Tools

🪛 Ruff (0.8.2)

929-929: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

1013-1031: Improve exception handling in one()

Consider using raise ... from None to explicitly indicate that these are new exceptions rather than re-raised ones.
-            raise ValueError("No samples match the given expression")
+            raise ValueError("No samples match the given expression") from None

-                raise ValueError(
+                raise ValueError(
                     "Expected one matching sample, but found %d matches"
-                    % len(view)
-                )
+                    % len(view)
+                ) from None
🧰 Tools

🪛 Ruff (0.8.2)

1019-1019: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
plugins/panels/model_evaluation/__init__.py (4)
17-17: Remove the unused import.
Static analysis indicates fiftyone.core.fields is not used. Eliminating unused imports keeps dependencies clear.
-import fiftyone.core.fields as fof
🧰 Tools

🪛 Ruff (0.8.2)

17-17: fiftyone.core.fields imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

151-156: Consider logging exceptions in get_custom_metrics.
It silently returns None when exceptions occur, which could hamper debugging. Logging the exception would improve visibility.
+import logging

 def get_custom_metrics(self, results):
     try:
         return results.custom_metrics
     except Exception as e:
+        logging.warning(f"Error retrieving custom metrics: {e}")
         return None
604-675: Expand documentation for the segmentation evaluation flow.
This large block supports various segmentations use-cases. Adding inline docstrings or comments would aid readability when handling different view_type paths.

698-707: Refine the naming of _get_mask_targets.
The logic for retrieving mask targets is sound, but consider using a name like _retrieve_mask_targets to better convey intent.
fiftyone/operators/evaluation_metric.py (2)

12-43: Clarify parameter docstring and usage

In EvaluationMetricConfig, consider explicitly documenting how eval_types is intended to be used. The parameter references possible evaluation types, but no further usage is described in this file. This additional clarity will help developers understand when and how to supply these types.

45-80: Keep method stubs well-documented

In the EvaluationMetric base class, get_parameters, parse_parameters, and compute are placeholders that raise NotImplementedError or provide no functionality. To improve clarity for future maintainers, consider adding docstring notes or comments explaining typical usage and what subclasses should implement.
fiftyone/utils/eval/regression.py (5)
Line range hint 43-115: Enhance validation of custom_metrics parameter

When adding custom_metrics to evaluate_regressions, ensure you handle type-checking and potential edge cases (e.g., strings vs. lists vs. dictionaries). This will help avoid runtime errors if callers provide invalid inputs.

121-137: External usage clarity

RegressionEvaluationConfig now supports custom_metrics. Document more explicitly how users can pass in custom metrics (list vs. dict) and give examples. This fosters a smoother onboarding for new contributors.

254-268: Suggest explicit example for SimpleEvaluationConfig

The docstring references custom metrics but does not illustrate usage. Provide a short code snippet demonstrating how to specify and access custom metrics to improve developer clarity.

Line range hint 381-421: Assess performance for large-scale usage

RegressionResults stores arrays (ytrue, ypred, etc.) in memory. For very large datasets, memory consumption may spike. Consider providing documentation or checks for extremely large data, especially given the introduction of multiple custom metrics that may perform additional computations.

587-590: Adopt the static analysis suggestion for custom_metrics

At lines 587 and 641 in _parse_config, replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics") as suggested by static analysis. This minor improvement clarifies the code and removes the redundant None argument.
- custom_metrics = kwargs.get("custom_metrics", None)
+ custom_metrics = kwargs.get("custom_metrics")
Also applies to: 641-644

🧰 Tools

🪛 Ruff (0.8.2)

587-587: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)
fiftyone/utils/eval/detection.py (3)

Line range hint 139-173: Validate custom metrics input type

In evaluate_detections(), custom_metrics can be a list of metrics or a dict. To improve reliability, consider adding a brief check or docstring examples showing acceptable formats. This fosters better usage across the detection evaluation ecosystem.

Line range hint 239-272: Improve coverage for new config parameters

DetectionEvaluationConfig now includes custom_metrics. Make sure your test suite validates scenarios where custom metrics are defined, used, renamed, or cleaned up, so that potential schema or logic mismatches are quickly detected.

Would you like a suggested test snippet covering multiple custom metrics with different parameter types?

531-532: Ensure thorough removal of custom metric fields

cleanup_custom_metrics() is invoked at line 531. If external code references these fields later, or the stored data is being used for a second pass, you might trigger exceptions. Add a cautionary note in the docstring that any dependent processes on these fields must be completed first.
fiftyone/utils/eval/base.py (3)
85-85: Use key in dict instead of key in dict.keys()

Per the static analysis hints (SIM118), it is more concise to iterate directly on the dictionary rather than its .keys() method.
-        for metric in self._get_custom_metrics().keys():
+        for metric in self._get_custom_metrics():
Also applies to: 101-101, 113-113

🧰 Tools

🪛 Ruff (0.8.2)

85-85: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

179-179: Rename unused loop control variable

The variable uri is not referenced in the loop body; renaming it to _ or _uri clarifies that it is unused.
-        for uri, metric in self.custom_metrics.items():
+        for _uri, metric in self.custom_metrics.items():
🧰 Tools

🪛 Ruff (0.8.2)

179-179: Loop control variable uri not used within loop body

Rename unused uri to _uri

(B007)

57-75: Guard against operator misconfiguration

While already handled by try/except, consider logging more context (e.g., operator config) or re-raising with a custom exception when an operator is missing required parameters, to aid debugging.
fiftyone/utils/eval/segmentation.py (3)
602-605: Use kwargs.get("custom_metrics")

As per the static analysis hint (SIM910), replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics") for cleaner code.
-    custom_metrics = kwargs.get("custom_metrics", None)
+    custom_metrics = kwargs.get("custom_metrics")
🧰 Tools

🪛 Ruff (0.8.2)

602-602: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)

389-389: Document matches usage

The matches list captures pixel-level matches for each segmentation pair. Consider documenting its definition more formally (e.g., in method docstrings) for future maintainers.

432-442: Potential memory usage concerns

Appending every (gt_label, pred_label, pixel_count, gt_id, pred_id) could be memory-intensive for large masks. If performance is a concern, consider optional sampling or summary stats to reduce memory overhead.
fiftyone/utils/eval/classification.py (1)
926-926: Use kwargs.get("custom_metrics")

Per the static analysis hint (SIM910), we can simplify code by omitting the default argument when calling .get().
-    custom_metrics = kwargs.get("custom_metrics", None)
+    custom_metrics = kwargs.get("custom_metrics")
🧰 Tools

🪛 Ruff (0.8.2)

926-926: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)
fiftyone/operators/__init__.py (1)
15-15: Add new imports to __all__.

The imported EvaluationMetricConfig and EvaluationMetric classes should be added to __all__ to make them available at the package level.
 from .evaluation_metric import EvaluationMetricConfig, EvaluationMetric
+__all__ = [
+    "EvaluationMetricConfig",
+    "EvaluationMetric",
+    # ... existing entries ...
+]
🧰 Tools

🪛 Ruff (0.8.2)

15-15: .evaluation_metric.EvaluationMetricConfig imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

15-15: .evaluation_metric.EvaluationMetric imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
fiftyone/core/fields.py (1)
1624-1684: LGTM! Well-implemented color conversion utilities.

The color conversion functions are well-documented and efficiently implemented. Consider a minor optimization for rgb_array_to_int:
-    return (
-        np.left_shift(mask[:, :, 0], 16, dtype=int)
-        + np.left_shift(mask[:, :, 1], 8, dtype=int)
-        + mask[:, :, 2]
-    )
+    return np.left_shift(mask[:, :, 0], 16) | np.left_shift(mask[:, :, 1], 8) | mask[:, :, 2]
Using bitwise OR operations can be more efficient than addition for combining bit-shifted values.
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)
1782-1828: Consider these improvements for better type safety and performance.

The CustomMetrics type could be more specific about the key type:
-  [operatorUri: string]: CustomMetric;
+  [operatorUri: `${string}/${string}`]: CustomMetric;
Add input validation to handle undefined/null inputs:
 function formatCustomMetricRows(evaluationMetrics, comparisonMetrics) {
+  if (!evaluationMetrics) return [];
   const results = [] as SummaryRow[];
Consider memoizing the function to prevent unnecessary recalculations:
-function formatCustomMetricRows(evaluationMetrics, comparisonMetrics) {
+const formatCustomMetricRows = React.useMemo(() => (evaluationMetrics, comparisonMetrics) => {
   // ... existing implementation
-}
+}, []);

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08976be and f891b81.

📒 Files selected for processing (23)

app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (3 hunks)
docs/source/release-notes.rst (1 hunks)
fiftyone/__public__.py (1 hunks)
fiftyone/core/collections.py (5 hunks)
fiftyone/core/dataset.py (6 hunks)
fiftyone/core/fields.py (1 hunks)
fiftyone/core/view.py (0 hunks)
fiftyone/operators/__init__.py (1 hunks)
fiftyone/operators/evaluation_metric.py (1 hunks)
fiftyone/operators/registry.py (4 hunks)
fiftyone/server/utils.py (1 hunks)
fiftyone/utils/eval/activitynet.py (5 hunks)
fiftyone/utils/eval/base.py (6 hunks)
fiftyone/utils/eval/classification.py (20 hunks)
fiftyone/utils/eval/coco.py (5 hunks)
fiftyone/utils/eval/detection.py (16 hunks)
fiftyone/utils/eval/openimages.py (5 hunks)
fiftyone/utils/eval/regression.py (18 hunks)
fiftyone/utils/eval/segmentation.py (23 hunks)
plugins/panels/model_evaluation/__init__.py (6 hunks)
tests/unittests/dataset_tests.py (0 hunks)
tests/unittests/operators/registry_tests.py (2 hunks)
tests/unittests/utils_tests.py (1 hunks)

💤 Files with no reviewable changes (2)

tests/unittests/dataset_tests.py
fiftyone/core/view.py

🧰 Additional context used

📓 Path-based instructions (1)

app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

🪛 Ruff (0.8.2)

fiftyone/__public__.py

166-166: .core.sample.Sample imported but unused

Remove unused import: .core.sample.Sample

(F401)

fiftyone/operators/__init__.py

15-15: .evaluation_metric.EvaluationMetricConfig imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

15-15: .evaluation_metric.EvaluationMetric imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

fiftyone/utils/eval/detection.py

641-641: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)

plugins/panels/model_evaluation/__init__.py

17-17: fiftyone.core.fields imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

fiftyone/utils/eval/segmentation.py

602-602: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)

fiftyone/utils/eval/regression.py

587-587: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)

fiftyone/utils/eval/base.py

85-85: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

101-101: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

113-113: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

179-179: Loop control variable uri not used within loop body

Rename unused uri to _uri

(B007)

fiftyone/utils/eval/classification.py

926-926: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)

fiftyone/core/dataset.py

1239-1239: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

fiftyone/core/collections.py

929-929: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

1019-1019: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-app
GitHub Check: e2e / test-e2e
GitHub Check: lint / eslint
GitHub Check: build / build
GitHub Check: build

🔇 Additional comments (44)

fiftyone/__public__.py (1)

166-166: Verify if Sample import can be safely removed.

The static analysis tool flags Sample as an unused import. However, since this is a public interface file and there are significant changes to sample handling, we should verify if this import is required for backward compatibility.

Let's verify the usage of Sample in the codebase:

🧰 Tools

🪛 Ruff (0.8.2)

166-166: .core.sample.Sample imported but unused

Remove unused import: .core.sample.Sample

(F401)

fiftyone/core/dataset.py (4)

1228-1228: LGTM! Good refactoring.

The change simplifies the code by delegating to the parent class implementation, reducing duplication while maintaining functionality.

1237-1241: LGTM! Clean implementation.

The change improves readability by using Python's slice syntax while maintaining proper error handling for empty datasets.

🧰 Tools

🪛 Ruff (0.8.2)

1239-1239: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

1255-1258: LGTM! Nice Pythonic implementation.

The change improves both readability and performance by using list comprehension and array slicing.

1272-1275: LGTM! Clean and efficient implementation.

The change improves both readability and performance by using list comprehension and negative array slicing.

fiftyone/core/collections.py (1)

Line range hint 926-1031: LGTM! Clean implementation of collection access methods

The implementations of first(), last(), head(), tail() and one() are clean and efficient:

first()/last() use iterator/slice access

head()/tail() use simple list comprehensions

one() provides good error handling for common cases

All methods have consistent error handling

plugins/panels/model_evaluation/__init__.py (8)

13-13: No concerns about this import usage.
The ObjectId import is utilized in _to_object_ids, which is appropriate.

351-359: Initialize segmentation results cautiously.
The call to _init_segmentation_results is fine. Ensure no redundant re-initialization is performed when working with segmentation data.

382-382: New custom_metrics key aligns with the introduced method.
Adding "custom_metrics": self.get_custom_metrics(results) consistently populates evaluation data with user-defined metrics.

709-771: Safeguard against potential KeyError in _init_segmentation_results.
If a label appears in results.ytrue or results.ypred but not in results.classes_map, a KeyError may occur. Consider adding checks for missing keys.

772-789: Check for nonexistent class keys in _get_segmentation_class_ids.
Use caution when accessing results._classes_map[x]. Missing entries could trigger KeyError.

791-797: Ensure class mapping exists in _get_segmentation_conf_mat_ids.
If x or y is missing from results._classes_map, an exception may occur. Handling unexpected keys can make the function more robust.

799-816: TP/FP/FN logic appears well-structured.
The conditions for true positives, false negatives, and false positives align with the established segmentation approach.

818-819: Conversion of IDs to ObjectId is straightforward and correct.
Using ObjectId for database referencing is consistent with the rest of the system.

fiftyone/operators/evaluation_metric.py (3)

1-7: Docstring consistency check

The file header and module docstring appear consistent with the rest of the codebase. However, confirm that the year range 2017-2025 is correct and update it if necessary.

95-115: Consider additional error handling

In rename(), a failure in field renaming (e.g., if a field no longer exists or has unexpected naming) may cause unexpected dataset states. Consider handling such errors gracefully or logging them more explicitly to aid debugging.

116-133: Potential data consistency issues when cleaning up

cleanup() removes metric fields from the dataset. Verify that downstream processes expecting these fields aren’t impacted if cleanup is called prematurely. Consider adding checks or warnings if removing these fields might conflict with in-progress evaluations or reliant operations.

fiftyone/utils/eval/regression.py (2)

Line range hint 144-236: Base class extension alignment

RegressionEvaluation includes calls like compute_custom_metrics(), rename_custom_metrics(), and cleanup_custom_metrics() from its base. Verify that all relevant behaviors are covered in unit tests to confirm no oversight in edge cases where the user-defined metrics might conflict with standard fields.

Line range hint 447-492: Check edge cases for combined standard + custom metrics

When merging standard metrics (MSE, R^2, etc.) with custom metrics in metrics(), confirm that naming conflicts (like overlapping metric keys) are handled gracefully. For instance, a user-defined metric named mean_squared_error could overwrite the standard metric.

fiftyone/utils/eval/detection.py (3)

23-27: Consistent import usage

You now import additional base classes from .base. If the goal is to unify all evaluation methods under these classes, ensure that any future renames or reorganizations keep their references consistent across all evaluation modules to avoid confusion or import cycles.

233-236: Compute custom metrics consistently

When calling compute_custom_metrics(), verify that the detection evaluation pipeline has completed other computations (like enumerating TP/FP/FN) first. This ensures the custom metric computations include necessary data.

Line range hint 538-553: Type alignment in DetectionResults

DetectionResults now inherits from BaseClassificationResults. This is a logical approach for storing arrays of true/pred labels. Still, confirm that all detection-specific attributes (e.g., bounding boxes, IoU arrays) remain fully compatible when combined with classification-based functionality.

fiftyone/utils/eval/base.py (2)

45-53: Validate custom_metrics usage

The _get_custom_metrics method returns potentially user-defined metrics. Make sure any conflicts with reserved metric names or special keys are handled gracefully.

161-188: Print method looks good

The print_metrics and _print_metrics methods effectively display the computed metrics. Usage of tabulate is a nice touch for readability.

🧰 Tools

🪛 Ruff (0.8.2)

179-179: Loop control variable uri not used within loop body

Rename unused uri to _uri

(B007)

fiftyone/utils/eval/segmentation.py (1)

476-480: Validate missing logic

Conditionally setting missing = classes[0] if values[0] in (0, "#000000") else None may inadvertently lump uninitialized or unknown classes into background. Verify that this defaulting won't cause confusion in large-scale or multi-class setups.

fiftyone/utils/eval/classification.py (3)

368-373: Ensure k is valid

When k >= len(_logits), top-k evaluation logic is forced to treat all logits as present. Confirm that k cannot exceed the number of classes in common usage and handle potential mismatches gracefully (e.g., if logits is empty).

713-713: New classification results class extends base results

Good approach. The additional parameters for custom_metrics integrate well with the base results. Make sure there's thorough unit test coverage for these new results flows.

Line range hint 770-785: Implicit negative class assumption

The constructor uses missing=classes[0] to mark negative classes by default. For tasks with reversed class ordering, ensure that custom workflow or reordering doesn’t break your logic.

tests/unittests/operators/registry_tests.py (1)

12-23: LGTM!

The reformatting of mock object definitions improves code readability while maintaining the test's functionality.
fiftyone/operators/registry.py (2)

9-10: LGTM!

The addition of eta.core.utils import supports the enhanced type checking functionality.

111-114: Verify dynamic class loading.

The implementation now supports dynamic class loading for operator types. Ensure that proper error handling is in place for invalid class paths.

Run this script to verify the error handling:

✅ Verification successful

Dynamic class loading implementation is sufficiently safe

The current implementation is adequately safe as it's used in an optional type filtering context where failures gracefully result in no matching operators. The isinstance check provides an additional safety layer, ensuring type compatibility regardless of class resolution success.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Test error handling for invalid class paths in list_operators.

# Test: Search for try-except blocks around get_class calls
rg -A 5 "get_class.*try|try.*get_class" .
Length of output: 43

Script:
#!/bin/bash
# Find implementation and usage of etau.get_class
ast-grep --pattern 'def get_class'

# Search for broader error handling around operator type handling
rg -B 3 -A 3 "isinstance.*type|if.*type.*:|type\s*=\s*" fiftyone/operators/

# Find imports to locate etau module
rg "^from.*etau.*import|^import.*etau" fiftyone/
Length of output: 19165
tests/unittests/utils_tests.py (4)

447-448: LGTM!

The added comment clearly indicates that this is not a unit test and warns about database modifications.

452-475: LGTM!

The test setup is well-structured with:

Clear test data creation

Proper cleanup of existing test data

Well-defined test configs with different versions

479-487: LGTM!

The assertions properly handle different scenarios:

Database admin: Verifies automatic cleanup

Non-admin: Verifies duplicates remain

Both: Verifies config consistency

488-490: LGTM!

The cleanup in the finally block ensures test isolation by removing test configs.

fiftyone/utils/eval/activitynet.py (2)

43-44: LGTM! Clean implementation of custom metrics support.

The addition of the custom_metrics parameter is well-documented and properly integrated with the superclass.

Also applies to: 55-55, 59-64

334-334: LGTM! Consistent implementation of custom metrics support.

The changes in the results class maintain consistency with the evaluation config class.

Also applies to: 351-352, 361-362

fiftyone/utils/eval/coco.py (2)

71-72: LGTM! Consistent implementation of custom metrics support.

The changes maintain consistency with the ActivityNet implementation.

Also applies to: 89-89, 93-98

275-275: LGTM! Consistent implementation of custom metrics support.

The changes maintain consistency with other detection results classes.

Also applies to: 292-293, 302-303

fiftyone/utils/eval/openimages.py (2)

73-74: LGTM! Consistent implementation of custom metrics support.

The changes maintain consistency with other evaluation config classes.

Also applies to: 94-94, 98-103

299-299: LGTM! Consistent implementation of custom metrics support.

The changes maintain consistency with other detection results classes.

Also applies to: 314-315, 324-325

app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)

489-489: LGTM!

The spread operator is used correctly to include custom metric rows in the summary table.
docs/source/release-notes.rst (2)

Line range hint 22-1164: LGTM!

The release notes are well-structured and provide comprehensive documentation of changes across different components. The content follows documentation best practices with appropriate cross-references and explanations.

6-14: ⚠️ Potential issue

Fix incorrect release date in Teams 2.5.0 notes.

The release date is set to January 24, 2025, which is in the future. This should be updated to reflect the actual release date.
-*Released January 24, 2025*
+*Released January 24, 2024*
Likely invalid or redundant comment.

coderabbitai · 2025-01-25T00:47:00Z

fiftyone/utils/eval/detection.py

+    custom_metrics = kwargs.get("custom_metrics", None)
+    if etau.is_str(custom_metrics):
+        kwargs["custom_metrics"] = [custom_metrics]
+


🛠️ Refactor suggestion

Adopt the static analysis hint for custom_metrics retrieval

Similar to regression.py, remove the redundant second argument in kwargs.get("custom_metrics", None):

- custom_metrics = kwargs.get("custom_metrics", None) + custom_metrics = kwargs.get("custom_metrics")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

custom_metrics = kwargs.get("custom_metrics", None)

if etau.is_str(custom_metrics):

kwargs["custom_metrics"] = [custom_metrics]

custom_metrics = kwargs.get("custom_metrics")

if etau.is_str(custom_metrics):

kwargs["custom_metrics"] = [custom_metrics]

🧰 Tools

🪛 Ruff (0.8.2)

641-641: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)

coderabbitai · 2025-01-25T00:47:00Z

fiftyone/server/utils.py

+def cache_dataset(dataset):
+    """Caches the given dataset.
+
+    This method ensures that subsequent calls to
+    :func:`fiftyone.core.dataset.load_dataset` in async calls will return this
+    dataset singleton.
+
+    See :meth:`load_and_cache_dataset` for additional details.
+
+    Args:
+        dataset: a :class:`fiftyone.core.dataset.Dataset`
+    """
+    _cache[dataset.name] = dataset
+


🛠️ Refactor suggestion

Add type hints and input validation.

The function should include type hints and validate its input:

-def cache_dataset(dataset): +def cache_dataset(dataset: "fiftyone.core.dataset.Dataset") -> None: """Caches the given dataset. This method ensures that subsequent calls to :func:`fiftyone.core.dataset.load_dataset` in async calls will return this dataset singleton. See :meth:`load_and_cache_dataset` for additional details. Args: dataset: a :class:`fiftyone.core.dataset.Dataset` + + Raises: + ValueError: If dataset is None or not a Dataset instance """ + if dataset is None: + raise ValueError("Dataset cannot be None") + if not isinstance(dataset, fod.Dataset): + raise ValueError("Expected a Dataset instance") _cache[dataset.name] = dataset

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2025-01-25T00:47:01Z

docs/source/release-notes.rst

+FiftyOne 1.3.0
+--------------
+*Released January 24, 2025*
+


⚠️ Potential issue

Fix incorrect release date in FiftyOne 1.3.0 notes.

The release date is set to January 24, 2025, which is in the future. This should be updated to reflect the actual release date.

-*Released January 24, 2025* +*Released January 24, 2024*

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

FiftyOne 1.3.0

--------------

*Released January 24, 2025*

FiftyOne 1.3.0

--------------

*Released January 24, 2024*

Add documentation for custom metrics

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

docs/source/user_guide/evaluation.rst (2)

2004-2188: Documentation for custom evaluation metrics looks great!

The section is well-structured and provides comprehensive coverage of both using and developing custom metrics. The examples are clear and the implementation details are thorough.

Consider adding a note about error handling best practices when implementing the compute() method, such as handling edge cases where the evaluation results might be invalid or incomplete.

Line range hint 2189-2999: Documentation for custom evaluation backends is comprehensive!

The section effectively covers all evaluation types and provides clear instructions for implementing custom backends. The configuration documentation is particularly thorough.

Consider adding a complete working example of a custom backend implementation for at least one evaluation type. This would help users better understand how all the pieces fit together in practice.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f891b81 and 8b8cdcf.

⛔ Files ignored due to path filters (1)

docs/source/images/evaluation/custom-evaluation-metric.png is excluded by !**/*.png, !**/*.png

📒 Files selected for processing (2)

docs/source/release-notes.rst (1 hunks)
docs/source/user_guide/evaluation.rst (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-app
GitHub Check: build / build
GitHub Check: lint / eslint
GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (2)

docs/source/release-notes.rst (2)

6-15: LGTM! Well-structured release notes for Teams 2.5.0.

The release notes are properly formatted and provide clear information about the changes in this version.

18-21: ⚠️ Potential issue

Fix incorrect release date in FiftyOne 1.3.0 notes.

The release date is set to January 24, 2025, which is in the future. This should be updated to reflect the actual release date.

Apply this diff to fix the date:
-*Released January 24, 2025*
+*Released January 24, 2024*
Likely invalid or redundant comment.

brimoor and others added 30 commits January 20, 2025 21:53

adding support for custom evaluation metrics

37cf9f5

Add custom metric to Summary in Model Evaluation panel

40f2dd3

PR feedback

c68b97e

Updates for custom_metrics display

3920bb8

remove json

84d65d9

ts updates

17199a6

rework custom metrics in summary rows

865bfa7

fix hidden custom metrics

0370906

fix typo

ecee790

Add lower_is_better

49d493d

remove additional_metrics property

e649e2a

lazy initialize

740f5e6

revert results.custom_metrics_config, add results.custom_metrics_report

9047d07

finalizing

c656853

use list fields wherever possible

c2fa541

adopt #5375 syntax

89259ba

adopt finalized #5375 syntax

26babad

typo

cab21ff

Merge pull request #5423 from voxel51/cherry-pick-builtin-use-lists

dad1c09

Cherry pick builtin use lists

allow customizing aggregate_key

e05cc17

Merge pull request #5279 from voxel51/custom-metrics

e7f6d33

Add support for custom evaluation metrics

Adding Seg changes

1828114

Reverting new variable

c7fd157

implement callbacks

1bb601d

store matches instead

5b17db1

more robust initialization

59d4044

Fix multi config test (#5425)

2b843b9

* fix test and make safer * wrap in try finally * use oid from var

Revert "Optimize dataset first/head and last/tail (#5407)" (#5429)

448177d

This reverts commit d306a36.

handle missing custom metrics

3d11a02

Merge pull request #5332 from voxel51/segmentation-callbacks2

2ac97ad

Add Model Evaluation panel callbacks for segmentation tasks

jleven and others added 10 commits January 24, 2025 09:31

Update release-notes.rst

bf0a516

Update release-notes.rst

44f6f22

Use the correct syntax for links

861c42f

Teams is listed first

438344f

push back one day

6b47902

we reverted one optimization, for now

777e76c

release notes pass

7255a16

Merge pull request #5424 from voxel51/release-notes-1.3.0

d3abd9a

Update release-notes for v1.3.0

Merge pull request #5432 from voxel51/release/v1.3.0

2d10556

Release v1.3.0

Merge branch 'main' into main-merge

f891b81

brimoor requested a review from a team January 25, 2025 00:42

benjaminpkane approved these changes Jan 25, 2025

View reviewed changes

brimoor mentioned this pull request Jan 25, 2025

Main -> develop #5434

Closed

7 tasks

coderabbitai bot reviewed Jan 25, 2025

View reviewed changes

manushreegangwar added 5 commits January 24, 2025 20:05

Add docs for custom metrics

2fa5c31

Update image

4b2dcae

Add note

4d0ae1c

fix alt text

36acffd

fix ref

68ba964

minhtuev approved these changes Jan 25, 2025

View reviewed changes

brimoor and others added 3 commits January 25, 2025 09:23

documentation pass

37642ac

Merge pull request #5430 from voxel51/docs/custom-metrics

9cded34

Add documentation for custom metrics

Merge branch 'main' into main-merge

8b8cdcf

coderabbitai bot reviewed Jan 25, 2025

View reviewed changes

brimoor merged commit e12fce4 into develop Jan 25, 2025
13 checks passed

brimoor deleted the main-merge branch January 25, 2025 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main -> develop #5435

Main -> develop #5435

brimoor commented Jan 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 25, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 25, 2025

coderabbitai bot Jan 25, 2025

coderabbitai bot Jan 25, 2025

coderabbitai bot left a comment

Main -> develop #5435

Main -> develop #5435

Conversation

brimoor commented Jan 25, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

FiftyOne Release Notes

coderabbitai bot commented Jan 25, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

brimoor commented Jan 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 25, 2025 •

edited

Loading