Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed issue in precision converting annotations with "force_mask=True" #1746

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

0xD4rky
Copy link

@0xD4rky 0xD4rky commented Dec 16, 2024

Description

When we use supervision to load YOLO annotations with force_masks=True, it internally converts normalized polygon coordinates from your YOLO text files into pixel coordinates (multiplying by image width/height) and then back into normalized coordinates when saving them out. During this round-trip, integer casting or rounding may occur, causing slight shifts in the polygon coordinates. This leads to “crooked” or misaligned masks.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

YOUR_ANSWER

Minimal Reproducible Code:

import numpy as np
import cv2
import os

resolution_wh = (640, 480)  
relative_polygon = np.array([
    [0.25, 0.4],
    [0.25, 0.6],
    [0.45, 0.6],
    [0.45, 0.4]
], dtype=np.float32)

def polygon_to_mask(polygon: np.ndarray, resolution_wh: tuple[int, int]) -> np.ndarray:
    """
    New approach: Convert to int at the last moment.
    """
    polygon_int = np.round(polygon).astype(np.int32)
    mask = np.zeros((resolution_wh[1], resolution_wh[0]), dtype=np.uint8)
    cv2.fillPoly(mask, [polygon_int], 1)
    return mask

def old_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    Old (problematic) approach: Cast to int too early.
    """
    polygons = (relative_polygon * np.array(resolution_wh)).astype(int)
    return polygon_to_mask(polygons, resolution_wh)

def new_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    New (improved) approach: Keep floats until mask creation.
    """
    polygons = relative_polygon * np.array(resolution_wh, dtype=np.float32)
    return polygon_to_mask(polygons, resolution_wh)

old_mask = old_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("old_mask.png", old_mask.astype(np.uint8)*255)  

new_mask = new_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("new_mask.png", new_mask.astype(np.uint8)*255) 

difference = np.bitwise_xor(old_mask, new_mask)
print("Number of differing pixels:", difference.sum())

# Instructions for Analysis:
# 1. Open old_mask.png and new_mask.png.
# 2. Check if the polygon edges appear more accurate in new_mask.png.
# 3. A reduced "Number of differing pixels" may indicate less distortion if comparing to a ground-truth mask.

Docs

The Docs haven't been updated yet, I need to check the validity of the PR with the maintainers first!

@CLAassistant
Copy link

CLAassistant commented Dec 16, 2024

CLA assistant check
All committers have signed the CLA.

@0xD4rky 0xD4rky changed the title Resolving Issue #368 ["force_mask = True"] fixed issue in precision converting annotations with "force_mask=True" Dec 17, 2024
@SkalskiP
Copy link
Collaborator

Hi @0xD4rky 👋🏻 thanks a lot for your interest in our library. It's true that the YOLO format requires normalization of box coordinates and masks, and loading and re-saving the dataset can lead to distortions, and we would like to minimize the level of these distortions.

However, before we decide to introduce any changes to supervision datasets, I need to see that your proposed solution actually minimizes the distortions. The test you attached only shows that the masks processed in two different ways are different. However, there is no reference point to the source polygon. That is, we don't know if and by how much the output polygon differs from the input one.

I would like to see a test where we have the source .txt file with annotations. This file is loaded and then saved back to disk. We can then compare the level of distortion.

@0xD4rky
Copy link
Author

0xD4rky commented Dec 17, 2024

Thanks @SkalskiP for pointing out the need to verify that change. I forgot to add the verification to it. I created a sample label file to notice how polygon's coordinates used to change before the change and how does the change handle the polygon rounding.

The below is the piece of code I used to analyze the changes in polygon's observed coordinates.

import os
import numpy as np
import supervision as sv

test_dir = "test_annotation"
os.makedirs(test_dir, exist_ok=True)
images_dir = os.path.join(test_dir, "images")
labels_dir = os.path.join(test_dir, "labels")
os.makedirs(images_dir, exist_ok=True)
os.makedirs(labels_dir, exist_ok=True)

data_yaml_path = os.path.join(test_dir, "data.yaml")

with open(data_yaml_path, "w") as f:
    f.write("train: ./\nval: ./\nnames: ['class0']\n")
image_name = "example.jpg"
image_path = os.path.join(images_dir, image_name)
import cv2
dummy_img = np.zeros((480, 640, 3), dtype=np.uint8)
cv2.imwrite(image_path, dummy_img)

original_polygon = [
    "0 0.25 0.4 0.25 0.6 0.45 0.6 0.45 0.4\n"
]

label_path = os.path.join(labels_dir, "example.txt")
with open(label_path, "w") as f:
    f.writelines(original_polygon)

ds = sv.DetectionDataset.from_yolo(
    images_directory_path=images_dir,
    annotations_directory_path=labels_dir,
    data_yaml_path=data_yaml_path,
    force_masks=True
)

ds.as_yolo(annotations_directory_path=labels_dir)
with open(label_path, "r") as f:
    processed_lines = f.readlines()
processed_polygon_line = processed_lines[0].strip()

def parse_yolo_polygon(line):
    vals = line.split()
    cls = vals[0]
    coords = list(map(float, vals[1:]))
    return cls, np.array(coords, dtype=float).reshape(-1, 2)

orig_cls, orig_coords = parse_yolo_polygon(original_polygon[0])
proc_cls, proc_coords = parse_yolo_polygon(processed_polygon_line)

print("Original Polygon Coordinates (Normalized):")
print(orig_coords)
print("Processed Polygon Coordinates (Normalized):")
print(proc_coords)

differences = np.linalg.norm(orig_coords - proc_coords, axis=1)
avg_difference = np.mean(differences)
max_difference = np.max(differences)

print("Average per-point difference:", avg_difference)
print("Max per-point difference:", max_difference)

We start with a known polygon in normalized YOLO coordinates. After loading and saving via supervision, we compare the polygon coordinates before and after. By computing the numeric difference, we get a quantitative measure of how much the polygon has been distorted.

  • the results before the changes are as follows:
Screenshot 2024-12-17 at 11 23 05 PM
  • the results after the changes are as follows:
Screenshot 2024-12-17 at 11 23 39 PM

You can see how the processed polygon coordinates are similar to the original coordinates after we have taken the changes into consideration.

  • One extra point: I will make one extra change in the code in the _polygons_to_masks function i.e. mask = mask[None, ...] so as to make mask (1,H,W) in dimension from (H,W).

@0xD4rky
Copy link
Author

0xD4rky commented Jan 24, 2025

hello @SkalskiP please review the changes once you have time, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants