Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle permission error properly when checking for file #856

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amritghimire
Copy link
Contributor

@amritghimire amritghimire commented Jan 24, 2025

main > #856 (this) > #857

Currently, we had blanket catch for exception when trying to check the
file using _isfile. As a result, the exception stacktrace was repeated
and catching the exception in script was difficult as we had to capture
different exception. This convert the error to datachain native error
that can be captured safely and proceed accordingly.

This is first step toward handling #600

Currently, we had blanket catch for exception when trying to check the
file using _isfile. As a result, the exception stacktrace was repeated
and catching the exception in script was difficult as we had to capture
different exception. This convert the error to datachain native error
that can be captured safely and proceed accordingly.

This is first step toward handling #600
@amritghimire amritghimire self-assigned this Jan 24, 2025
Copy link

cloudflare-workers-and-pages bot commented Jan 24, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 911d360
Status: ✅  Deploy successful!
Preview URL: https://8ca60744.datachain-documentation.pages.dev
Branch Preview URL: https://amrit-handle-permission-erro.datachain-documentation.pages.dev

View logs

@amritghimire amritghimire requested review from ilongin, dreadatour and a team January 24, 2025 03:20
@shcheklein
Copy link
Member

Currently, we had blanket catch for exception when trying to check the
file using _isfile. As a result, the exception stacktrace was repeated
and catching the exception in script was difficult as we had to capture
different exception

I'm not sure I understand this ... could you give an example please?

@amritghimire
Copy link
Contributor Author

Currently, we had blanket catch for exception when trying to check the
file using _isfile. As a result, the exception stacktrace was repeated
and catching the exception in script was difficult as we had to capture
different exception

I'm not sure I understand this ... could you give an example please?

https://github.com/iterative/datachain/pull/856/files#diff-ba026f8084c104850332b7e62ae0741b251f65b67d0271ce5870f4794a1346bfL102

@amritghimire
Copy link
Contributor Author

#856 (files)

is it an example?

Yes, as you see we had

except: 

with supressed lint. That means no matter what the error we were returning false. So, if we did not have the permission, the script proceeded.

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.45%. Comparing base (13e5c13) to head (911d360).

Files with missing lines Patch % Lines
src/datachain/lib/listing.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #856      +/-   ##
==========================================
- Coverage   87.53%   87.45%   -0.08%     
==========================================
  Files         128      128              
  Lines       11369    11377       +8     
  Branches     1538     1539       +1     
==========================================
- Hits         9952     9950       -2     
- Misses       1029     1036       +7     
- Partials      388      391       +3     
Flag Coverage Δ
datachain 87.43% <90.90%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amritghimire
Copy link
Contributor Author

https://github.com/iterative/datachain/pull/767/files#diff-ba026f8084c104850332b7e62ae0741b251f65b67d0271ce5870f4794a1346bfR102-R103

This code looks problematic but uncaught because we just ignored everything with except. The _isfile doesn't work in windows for glob as a result as seen in https://github.com/iterative/datachain/actions/runs/12948522550/job/36117365039?pr=856

Comment on lines +107 to +113
except FileNotFoundError:
return False
except REMOTE_ERRORS as e:
raise ClientError(
message=str(e),
error_code=getattr(e, "code", None),
) from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to catch other exceptions here? 🤔

@shcheklein
Copy link
Member

This code looks problematic but uncaught because we just ignored everything with except. The _isfile doesn't work in windows for glob as a result as seen in https://github.com/iterative/datachain/actions/runs/12948522550/job/36117365039?pr=856

yep, but that's why I guess we were just ignoring a wide range of exceptions and the rest of the code is built to handle that path as a non-file

should we just try to catch Remote exceptions that are relevant to credentials, etc. And then do a wide open except:? wdyt?

@@ -90,6 +91,10 @@ def _isfile(client: "Client", path: str) -> bool:
Returns True if uri points to a file
Copy link
Member

@skshetry skshetry Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein, was _isfile() added just for making it work with Google Cloud?

If so, I'd recommend extending GCSFileSystem._is_file (my preference), or add GCSClient._is_file.

Whitelisting exceptions on a fs-agnostic/generic code is not a good thing to do.
get_listing could raise any unknown exceptions that's not FileNotFoundError in _is_file as a ClientError.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I think it is a general code (there are special zero-byte files across all clouds I think) - we need to filter them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we stopped doing these checks in dvc. We ignore files that end with slashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so that's the point of this call - to always treat it as a directory AFAIR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in some cases (not sure if this works the same across all clients) it returns a file/ even if ask it to check file if zero-byte object exists

@@ -90,6 +91,10 @@ def _isfile(client: "Client", path: str) -> bool:
Returns True if uri points to a file
"""
try:
if "://" in path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed, btw?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants