orphan |
---|
true |
This CHANGELOG refers to the time this project was maintained internally by XXII under the name "Libia".
Since the commit history has been removed for security reasons,
the chaneglog is kept for informational purpose and should not be modified.
The new CHANGELOG is [here](changelog.md)
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Pin numpy version to be <2
- Add
Dataset.remove_invalid_images
andDataset.remove_invalid_annotations
methods. - Add
mark_origin
andoverwrite_origin
options toDataset.merge
method - Add
from_pascalVOC_detection
andfrom_pascalVOC_generic
functions to load pascal datasets - Add
dataset_regression
fixture for pytest that will test that datasets are the same - Add more examples to documentation
- Fix spelling errors
- Upgrade minimum version to 3.10, so long python 3.9!
- Upgrade pre-commit template and run it
- Change most dataset method return types to
Self
instead of simply"Dataset"
- Change classmethod
Dataset.from_template
to be a simple method. Note that this change is not breaking, asDataset.from_template(input_dataset, **kwargs)
is equivalent toinput_dataset.from_template(**kwargs)
from_coco
andfrom_crowdhuman
both try to parse intelligently the annotation file path to extract both the dataset name and the split name, thanks to a new functionlibia.dataset.io.common.parse_annotation_name
Dataset.merge
now automatically convert images root of a dataset to absolute if the other is also absoluteto_fiftyone
methods (for dataset and evaluator) now accept aexisting
option to handle existing dataset. You can now erase the existing dataset before uploading yours, or raise an error if it exists. Possibly breaking : default behaviour ofto_fiftyone
methods was "update" and is now "error"Dataset.match_index
now accepts a dataset as well as an image dataframe like beforeDataset.remap_from_other
now acceptsremove_not_mapped
andremove_emptied_images
options to remove classes that are not present in the other dataset.Evaluator
now accepts a prediction label map that is neither a subset nor a superset of ground truth label map, and will assume only false negative and false positive for the not mutual classes.dummy_dataset
now accepts optionskeypoints_share
andadd_confidence
to make crowd datasets and predictionsDataset.add_annotations
andannotations_appender.append
now accepts more flexible attributes shapes, and then broadcast them together.
- Add the possibility to test dataset equality modulo columns that are all NaNs
- Add warning message when label map is incomplete, and complete it with the simple id -> str(id) mapping for missing ids
- Add
check_exhaustive
option toDataset.check
andassert_images_valid
functions
- Fix c2p CLI tool to effectively remove a detection when it is modified
Dataset.remove_empty_images
now keeps the dataset name- add docs for darknet IO
- Suppress some FutureWarning from pandas during tests
- fix bug for caipy when split is
pd.NA
instead ofNone
ornp.nan
- fix bug when loading caipy with
splits_to_read
set to non existing splits - Code spelling
- Add input format option for COCO loading, making it possible to load XY coordinates instead of just bounding boxes
- Add
from_coco_keypoints
function for loading COCO data with points and only one class. - Add compatibility with caipyjson tags and attributes, and more generally any kind of nested dictionary
- Add column boooleanizer (and debooleanizer) to go from a list objects to columns of boolean value for better queries
- Add Crowd detection evaluator with Mean Average Error metric for count
- Add reindex function
- Add
from_mot
function for loading datasets in MOT format. See https://motchallenge.net/instructions/ - Add a method to compute confusion matrix for DetectionEvaluator
- Add reindex function
- Add yolov7 compatibility with a
Dataset.to_yolov7
method. - Add automatic compliance with schema when saving to caipy
- Add compatibility with caipy splits independently indexed
- Add iterator helper methods to
Dataset
likeDataset.iter_images
andDataset.iter_splits
to make it easier to iterate by a specific attribute - When loaded with a schema,
from_caipy
automatically set missing arrays to the empty list and other fields to their default value specified in the schema when at least one sample in the caipy folder has the field set to a particular value in its caipyjson file, avoiding NaN values in the resulting dataframe. - Add
to_parquet
andfrom_parquet
method to save and load dataset efficiently with pyarrow. - Add dataframe booleanized columns broadcasting functions, useful for merging datasets
- Add better error messages when calling check functions from
utils.testing
- Add
remap_from_other
method to remap label map to match another dataset. - Add
realign_label_map
argument inDataset.merge
to avoid incompatible label maps error - Add
assert_columns_properly_normalized
for caipy json reading - Add
Dataset.empty()
method to create the same dataset object as before, but with an empty dataframe of annotations. This is useful when creating a prediction dataset. - Add
AnnotationAppender.reset()
andAnnotationAppender.finish()
methods to be able to use the annotation appender outside a context window - Add
category_ids_mapping
optional argument toAnnotationAppender
and related functions in order to remap the category ids from predictions - Add
flatten_paths
to cAIpy export function, which lets you save a dataset without subfolders. - Add
c2f
standalone script to quickly open a caipy dataset into fiftyone - Add
from_files
function, similar to ``from_folder` but when you already know what files or file patterns you want in the root folder. - Add
difftools
inlibia.utils
to compute difference between datasets. Useful when we want to update something related to it (like fiftyone) - Add
libia.utils.doc_utils
for examples in docstring, with a dummy dataset creator - Add Examples in all methods of
Dataset
object. - Add
Dataset.reset_index_from_mapping
method to remap index of images and annotationbs dataframes - BREAKING Remove
Dataset.reindex
method and rename itDataset.match_index
to avoid confusion withpandas.reindex
- Add "See Also" admonitions in many methods to link methods together and to see the related tutorial each time
- Add schemas tutorial
- Caipy save is much faster
- Up-to-date dependencies
from_coco
function now haslabel_map
option in case the categories field is empty in the input jsonfrom_coco
assumescategory_id
to be 0 in case it is absent from annotations fields. It will error if it's not absent from ALL annotations though.- BREAKING
Evaluator.predictions
renamed toEvaluator.predictions_dictionary
for better clarity - BREAKING
DetectionEvaluator.compute_matches
andDetectionEvaluator.compute_precision_recall
have changed theirpredictions
option topredictions_names
for better clarity. Dataset.merge
now tries to fuse dataframes with overlapping ids, as long as the common subset is the sameDataset.reset_index
now accepts astart_image_id
.- BREAKING
Dataset.dataset_path
is deprecated in favorDataset.images_root
, similar toEvaluator
. - Introduce the optional
dataset_name
attribute to be used when dataset name is not the folder name of images root but can be deduced from the loader function, e.g. infrom_caipy
- dataset merging now merge image indexes before concatenating the annotations. Useful when merging a dataset with annotations and the same dataset with pre-annotations.
- refactor dataset merge logic in a dedicated module
- dataset addition falls back to
realing_label_map
in merge when aIncompatibleLabelMapsError
is raised. - add
create_split_folder
option indataset_to_darknet
function and relatedDataset
methods, allowing to save all images of a particular split in its dedicated folder. Dataset.get_split
now acceptNone
value to get all images with a null split value if needed.- BREAKING
Dataset.remap_from_DataFrame
renamed toDataset.remap_from_dataframe
- Replace warning types from
UserWarning
to the right warning type (DeprecationWarning
orRuntimeWarning
) - Add pandas style
Dataset.loc
,Dataset.iloc
,Dataset.loc_annot
andDataset.iloc_annot
indexers, along withfilter_images
andfilter_annotations
method. - Add
record_fo_ids
options inDataset.to_fiftyone
andDetectionEvaluator.to_fiftyone
methods to keep track of fiftyone's UUID of each corresponding image and annotation. - Add markdownlint pre-commit hook (and make markdown documents compliant with it)
- Add
--watch
argument incaipy_to_fiftyone
script to perform live update of fiftyone datasets each time a file is modified in the caipy dataset. Useful when constructing a dataset progressively. - Add
start_annotations_id
option toDataset.reset_index
method. - Add supplementary checks and formatting to the Dataset basic constructor.
- Add more explanation on crowd counting tutorial.
- Get split does not rely on split being present in annotations anymore
- crowdhuman head visibility is unknown
- Class remapping is now compatible when label map is only a subset of remap dict
- PNG to JPG conversion now works for RGBA images (note that the Alpha channel will be lost)
to_yolov5
now automatically convert split values likeeval
andvalid
to their yolov5 accepted equivalent (resp.test
andval
)- fix
DetectionEvalutator.matches
being tied to the class instead of the instance. - fix dependencies problem: sklearn is in core dependencies and matplotlib in optional "plot-utils" group
- fix yolov7 problem, image path in txt files are also absolute. Please don't use yolov7 export if you don't need to, the dataset specs are terrible.
- diverse pycharm warnings fixed
- type hint of
from_folder
improved from_folder
method does not crash when folder is empty, but returns an empty dataset with a warning.- Warnings and pyright errors from last pandas version are suppressed
- Use tight layout for confusion matrix plot result
- Use json normalize when loading COCO so that it can be converted to fiftyone
- Skip processing steps when converting an empty dataset to fiftyone or when appending empty annotations to the dataset with the annotation appender context manager
- Prevent annotations index to be reset when using annotations appender
- Prevent loss of dataset name when calling
merge
,reset_index
,remap_classes
- libia.model subpackage (dead legacy code) got deleted
- Add CrowdHuman loading module See https://www.crowdhuman.org/
- Add
darknet_generic
loading module - Add more test to improve coverage
- introduce a
BBOX_COLUMN_NAMES
convention for bounding column names in dataset's annotation dataframe
- sum of datasets is now functional and tested (was not working before)
- Fix bug regarding confidence subsampling for PR curves
- Proper extremal point for PR curves
- Caipy split stays to None if no split is given when loading and data is in root
- Caipy save keep added attributes during runtime when saving
- Add remove empty images method to dataset
- Add remove emptied images option in remap classes
- Add remove not mapped classes option in remap classes (not mapped were always removed before)
- Add
f_scores_betas
to compute all wanted F-scores, F1, F0.5, F2, etc...
- PR curves are now indexed by recall with 101 evenly spaced values between 0 and 1 by default. The old behaviour can be retrieved by setting the option
index_column
to None. - Reworked evaluation demo
- Improved documentation
- Add bounding box converter
- Add image folder io, when input is simply a folder with images, but no annotation
- load caipy generic does not have to specify an image folder anymore
- conversion to fiftyone for datasets and evaluators
- bugfix regarding annotation index when it's duplicated
- group continuous data with either interval labels (by default), mid-point, mean point or median point
- BREAKING evaluation predictions and matches are now dictionaries and can be used to evaluate multiple predictions sets at the same time
- BREAKING group type alias is now either a column or a ContinuousGroup object (a dictionary that does the same thing but with better checking)
- Fix several failing pyright tests because pandas stubs was updated
- Add caipy generic format
- Add testing module in utils
- More thorough tests for io
- More complete notebook for demo_dataset
- pre-commit's flake8 repo url was moved from gitlab to gitHub
- dataset evaluation tool : see tutorials/demo_evaluation
- dataset split tool : see tutorials/demo_split
- new code checkers, including pyright and pandas stubs
- Features: Merge, Class remapping, etc.