-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort output of S3_File.list
and Enso_File.list
#11929
base: develop
Are you sure you want to change the base?
Conversation
@@ -187,7 +187,7 @@ type S3_File | |||
S3_File.Value (S3_Path.Value bucket key) self.credentials | |||
files = pair.second . map key-> | |||
S3_File.Value (S3_Path.Value bucket key) self.credentials | |||
sub_folders + files | |||
sub_folders + files . sort on=.s3_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this just sort files
but not sub_folders
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we could just sort on .path
and then we wouldn't need to implement the custom comparator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to not use a comparator? I was thinking of adding a comparator for S3_File
as well -- if a user might have a vector of them, then they might want to sort it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it sorted the files and sub-folders properly:
s3://enso-data-samples/Store Data.xlsx
s3://enso-data-samples/data_2500_rows.csv
s3://enso-data-samples/examples/
s3://enso-data-samples/locations.json
s3://enso-data-samples/products.csv
s3://enso-data-samples/sample.xml
s3://enso-data-samples/spreadsheet.xls
s3://enso-data-samples/spreadsheet.xlsx
s3://enso-data-samples/tableau/
s3://enso-data-samples/transactions.csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still add parentheses, the precedence of +
vs .
is not something that everyone knows immediately, so I think making it clear with parentheses will make it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to not use a comparator? I was thinking of adding a comparator for
S3_File
as well -- if a user might have a vector of them, then they might want to sort it.
Hmm, I guess we could do that yeah. But still I'd rely on comparing the path
as Text
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
type Vector_Comparator | ||
compare x y = | ||
min_length = x.length.min y.length | ||
when_prefixes_equal = | ||
## At this point, if the vectors are the same length, then they are | ||
identical; otherwise, the shorter one is lesser. | ||
if x.length == y.length then Ordering.Equal else | ||
Ordering.compare x.length y.length | ||
go i = | ||
if i >= min_length then when_prefixes_equal else | ||
x_elem = x.at i | ||
y_elem = y.at i | ||
Ordering.compare x_elem y_elem . and_then <| | ||
@Tail_Call go (i + 1) | ||
k = go 0 | ||
k | ||
|
||
hash x = Default_Comparator.hash_builtin x | ||
|
||
Comparable.from (that : Vector) = Comparable.new that Vector_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this just re-implements Vector_Lexicographic_Order.compare
?
I think we did not register it into Comparable
on purpose - comparing vectors is not unambiguously defined operation - there's multiple orderings that may seem right depending on context. So we did not want to provide any default one to avoid user confusion and require to choose one explicitly (we implement only lexicographic one but for example an (incomplate) ordering could be the point-wise order.
Of course we can decide that it is beneficial to include a default <
for vectors. But I think the original decision was not wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside to making the comparable is that if a user accidentally nests vectors, and sorts them, they won't know they made a mistake. But it will be easy to see that the data is nested. Without a comparator, they'll get an error.
The upside is that they'll be able to sort vectors containing more things.
@JaroslavTulach You asked if modern languages allow this; Haskell does:
main = do
msp $ [1, 2] < [3, 4]
=> True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the Vector_Comparator
since it's redundant and it's a separate question.
r . should_equal (r.sort on=.path) | ||
r . should_equal (r.sort on=.enso_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enso_path
is private, I don't think the second check can work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
test/AWS_Tests/src/S3_Spec.enso
Outdated
group_builder.specify "list should sort its output" <| | ||
r = root.list | ||
r.should_be_a Vector | ||
r . should_equal (r.sort on=.s3_path) | ||
r . should_equal (r.sort on=(x-> x.s3_path.to_text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, .s3_path
is private and implementation detail that should be a black box in the tests, use path
that returns Text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to .path
I removed the changes to vector, and added |
@@ -616,3 +616,11 @@ translate_file_errors related_file result = | |||
s3_path = S3_Path.Value error.bucket error.key | |||
s3_file = S3_File.Value s3_path related_file.credentials | |||
Error.throw (File_Error.Not_Found s3_file) | |||
|
|||
type S3_File_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type should probably be marked as PRIVATE
as we don't want it to show up in CB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mark the functions as private.
hash x = S3_Path_Comparator.hash_builtin x | ||
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin" | ||
|
||
Comparable.from that:S3_Path = Comparable.new that S3_Path_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparable.from that:S3_Path = Comparable.new that S3_Path_Comparator | |
## PRIVATE | |
Comparable.from that:S3_Path = Comparable.new that S3_Path_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;-)
It would be easier to change the IDE to not display from
methods in the component browser than adding this ## PRIVATE
at every conversion method occurrence.
compare x y = Ordering.compare x.bucket y.bucket . and_then (Ordering.compare x.key y.key) | ||
|
||
hash x = S3_Path_Comparator.hash_builtin x | ||
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the linter's benefit please tag ## PRIVATE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. why does the S3_Path_Comparator
contain a method hash_builtin
? It does not seem right.
I'm not sure we have a rule for this, but every builtin should have at most one (or perhaps, exactly one) registration.
Instead, I think we should just delegate to default comparator there.
compare x y = Ordering.compare x.bucket y.bucket . and_then (Ordering.compare x.key y.key) | |
hash x = S3_Path_Comparator.hash_builtin x | |
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin" | |
compare x y = Ordering.compare x.bucket y.bucket . and_then (Ordering.compare x.key y.key) | |
hash x = Default_Comparator.hash_builtin x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry the suggestion I gave you above won't work because Default_Comparator
is private.
Still registering a duplicated builtin to get around private check is not right. Moreover there are movements (#6282) to allow builtins only from the Base
library, so this would not work then.
I guess the correct solution then may be to delegate to the default comparator by computing the hash of the element:
compare x y = Ordering.compare x.bucket y.bucket . and_then (Ordering.compare x.key y.key) | |
hash x = S3_Path_Comparator.hash_builtin x | |
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin" | |
compare x y = Ordering.compare x.bucket y.bucket . and_then (Ordering.compare x.key y.key) | |
## PRIVATE | |
hash x = | |
key = x.key | |
Comparable.from key . comparator . hash key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved just the hash builtin to Ordering.enso:
## PRIVATE
default_comparator_hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it would be more efficient to call the builtin directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved just the hash builtin to Ordering.enso:
## PRIVATE default_comparator_hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin"
That still does not prepare us for #6282. Once that issue is addressed, builtin methods cannot be exposed from Base directly. You would need to create a method:
default_comparator_hash x = default_comparator_hash_builtin x
that would be wrapping the builtin, then you could export default_comparator_hash
as part of public API.
Still, I'm not quite sure if this is the correct approach.
I think the correct approach to creating a custom hash function is to rely on Comparable.from
on the elements of the custom type and then combining the resulting hashes (if there's >1 element). We currently perhaps lack a generic function to combine hashes - that's perhaps something that could be added to the public API of Base. We could just compute the hash of the array of hashes to combine, I think that should work okay.
Seems like it would be more efficient to call the builtin directly.
On a hot path the overhead should be totally negligible. But I guess that would be a question to the engine team. Regardless, the guidelines from the engine team were to not export builtin methods as public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For implementing a generic hash function, I would prefer to use the existing implementations in HashCodeNode
rather than create another, different generic one in Enso.
What I am currently doing:
S3_File.enso:
type S3_File_Comparator
...
hash x = default_comparator_hash x
Ordering.enso:
default_comparator_hash x = Default_Comparator.hash x
This results in a stack overflow.
@JaroslavTulach should I try to use HashCodeNode
for a generic hash implementation?
compare x y = Ordering.compare x.s3_path y.s3_path | ||
|
||
hash x = S3_File_Comparator.hash_builtin x | ||
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin" | ||
|
||
Comparable.from that:S3_File = Comparable.new that S3_File_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we tag these as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
compare x y = Ordering.compare x.enso_path y.enso_path | ||
|
||
hash x = Enso_File_Comparator.hash_builtin x | ||
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin" | ||
|
||
Comparable.from that:Enso_File = Comparable.new that Enso_File_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
type Enso_Path_Comparator | ||
compare x y = Vector_Lexicographic_Order.compare x.path_segments y.path_segments | ||
|
||
hash x = Enso_Path_Comparator.hash_builtin x | ||
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin" | ||
|
||
Comparable.from that:Enso_Path = Comparable.new that Enso_Path_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
compare x y = Ordering.compare x.s3_path y.s3_path | ||
|
||
## PRIVATE | ||
hash x = default_comparator_hash x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not hash x = Ordering.hash x.s3_path
to be consistent with compare
?
I've removed the comparators. |
@@ -451,7 +449,7 @@ type Enso_File | |||
file = Enso_File.Value (self.enso_path.resolve asset.name) | |||
Asset_Cache.update file asset | |||
file | |||
results.sort | |||
results.sort on=(f-> f.enso_path.path_segments) by=Vector_Lexicographic_Order.compare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK, although I'd still prefer to just do on=.path
for all backends.
The catch is - if we ever make list
with recursive=True
allow to cross data links, Enso_File.list
could still return S3_File
instances if there's a data link pointing towards an S3 bucket. So a file-specific approach to sorting would break in such scenario whereas a .path
based one will always be universal.
Although I'm not sure how likely the above scenario is, so for now this is also OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
results.sort on=.path
looks simple.
Also:
Comparable
forVector
.Closes #11899
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.