-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle permission error properly when checking for file #856
base: main
Are you sure you want to change the base?
Conversation
Currently, we had blanket catch for exception when trying to check the file using _isfile. As a result, the exception stacktrace was repeated and catching the exception in script was difficult as we had to capture different exception. This convert the error to datachain native error that can be captured safely and proceed accordingly. This is first step toward handling #600
Deploying datachain-documentation with Cloudflare Pages
|
I'm not sure I understand this ... could you give an example please? |
|
Yes, as you see we had
with supressed lint. That means no matter what the error we were returning false. So, if we did not have the permission, the script proceeded. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #856 +/- ##
==========================================
- Coverage 87.53% 87.45% -0.08%
==========================================
Files 128 128
Lines 11369 11377 +8
Branches 1538 1539 +1
==========================================
- Hits 9952 9950 -2
- Misses 1029 1036 +7
- Partials 388 391 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This code looks problematic but uncaught because we just ignored everything with except. The _isfile doesn't work in windows for glob as a result as seen in https://github.com/iterative/datachain/actions/runs/12948522550/job/36117365039?pr=856 |
except FileNotFoundError: | ||
return False | ||
except REMOTE_ERRORS as e: | ||
raise ClientError( | ||
message=str(e), | ||
error_code=getattr(e, "code", None), | ||
) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to catch other exceptions here? 🤔
yep, but that's why I guess we were just ignoring a wide range of exceptions and the rest of the code is built to handle that path as a non-file should we just try to catch Remote exceptions that are relevant to credentials, etc. And then do a wide open |
@@ -90,6 +91,10 @@ def _isfile(client: "Client", path: str) -> bool: | |||
Returns True if uri points to a file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein, was _isfile()
added just for making it work with Google Cloud?
If so, I'd recommend extending GCSFileSystem._is_file
(my preference), or add GCSClient._is_file
.
Whitelisting exceptions on a fs-agnostic/generic code is not a good thing to do.
get_listing
could raise any unknown exceptions that's not FileNotFoundError
in _is_file
as a ClientError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I think it is a general code (there are special zero-byte files across all clouds I think) - we need to filter them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we stopped doing these checks in dvc. We ignore files that end with slashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, so that's the point of this call - to always treat it as a directory AFAIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in some cases (not sure if this works the same across all clients) it returns a file/
even if ask it to check file
if zero-byte object exists
@@ -90,6 +91,10 @@ def _isfile(client: "Client", path: str) -> bool: | |||
Returns True if uri points to a file | |||
""" | |||
try: | |||
if "://" in path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed, btw?
main > #856 (this) > #857
Currently, we had blanket catch for exception when trying to check the
file using _isfile. As a result, the exception stacktrace was repeated
and catching the exception in script was difficult as we had to capture
different exception. This convert the error to datachain native error
that can be captured safely and proceed accordingly.
This is first step toward handling #600