Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for a new version #145

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Prepare for a new version #145

wants to merge 10 commits into from

Conversation

asinghvi17
Copy link
Collaborator

@asinghvi17 asinghvi17 commented Apr 2, 2024

This is a combined PR for a bunch of different PRs that are currently up. Below is a summary of changes:

  • Add metadata to the dataframe returned by dataset, indicating that the dataframe was generated by RDatasets.jl and mentioning its package and dataset name as a Tuple. This is essentially a call DataFrames.metadata!(df, "RDatasets.jl" => (package_name, dataset_name)).
  • Add a description function to RDatasets, make it readable in the REPL
    • Make this function discoverable, document it.
  • Bump RData.jl compat to 1.
  • Add instructions for data addition and improve data addition script
  • Bump version to v0.8

PRs #135 from @frankier and #124 from @jbrea are incorporated here.

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2024

Codecov Report

Attention: Patch coverage is 18.18182% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 63.88%. Comparing base (b1a5959) to head (4aac673).

Files Patch % Lines
src/dataset.jl 18.18% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #145       +/-   ##
===========================================
- Coverage   83.33%   63.88%   -19.45%     
===========================================
  Files           3        4        +1     
  Lines          24       36       +12     
===========================================
+ Hits           20       23        +3     
- Misses          4       13        +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

asinghvi17 and others added 6 commits April 2, 2024 08:44
Co-authored-by: jbrea <jbrea@users.noreply.github.com>
* Streamline adding a new dataset

 * Add instructions to README for adding a new dataset
 * Add scripts to update the dataset metadata
 * Add update_doc method to only add a single dataset
 * Add HTML documentation generation to update_doc
 * Change update_doc to correctly round trip quotes in the metadata CSV

* Sort datasets CSV

* Allow datasets with a .RData extension as well as .rda

---------

Co-authored-by: Frankie Robertson <frankie@robertson.name>
This allows them to be displayed in a much better way in the REPL.
@asinghvi17 asinghvi17 requested a review from bkamins April 2, 2024 13:46
@asinghvi17 asinghvi17 marked this pull request as ready for review April 2, 2024 13:52
Project.toml Outdated Show resolved Hide resolved
@bkamins
Copy link
Contributor

bkamins commented Apr 4, 2024

The changes look to make sense. I left one comment. I am not a maintainer of this package (and I do not know its internals). Maybe @nalimilan knows who has appropriate knowledge of the internals to approve it. Thank you for working on it.

@kdpsingh
Copy link

Appreciate everyone's work on this package.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! I have a few comments.

A type to hold the content of a dataset description.

The main purpose of its existence is to provide a way to display the content
differently in HTML and markdown contexts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
differently in HTML and markdown contexts.
differently in HTML and Markdown contexts.

Comment on lines +42 to +44

!!! note Unexported
This function is left deliberately unexported, since the name is pretty common.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a standard pattern AFAIK. Better mark the function as public via @compat public description at the same place as exports. This is available since Compat 3.47.0 and 4.10.0. Could also add packages to that list BTW.

Suggested change
!!! note Unexported
This function is left deliberately unexported, since the name is pretty common.


Invoke this function in exactly the same way you would invoke `dataset` to get the dataset itself.

This object prints well in the REPL, and can also be shown as markdown or HTML.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This object prints well in the REPL, and can also be shown as markdown or HTML.
This object prints well in the REPL, and can also be shown as Markdown or HTML.

RDatasets.description(package_name::AbstractString, dataset_name::AbstractString)
RDatasets.description(df::DataFrame) # only call this on dataframes from RDatasets!

Returns an `RDatasetDescription` object containing the description of the dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns an `RDatasetDescription` object containing the description of the dataset.
Return an `RDatasetDescription` object containing the description of the dataset.


"""
RDatasets.description(package_name::AbstractString, dataset_name::AbstractString)
RDatasets.description(df::DataFrame) # only call this on dataframes from RDatasets!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this information in the docstring body instead. Also say what happens if that's not the case.

Suggested change
RDatasets.description(df::DataFrame) # only call this on dataframes from RDatasets!
RDatasets.description(df::DataFrame)

error("Unable to locate dataset file $rdaname or $csvname")
end
# Finally, inject metadata into the dataframe to indicate origin:
DataFrames.metadata!(dataset, "RDatasets.jl", (string(package_name), string(dataset_name)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed AFAICT:

Suggested change
DataFrames.metadata!(dataset, "RDatasets.jl", (string(package_name), string(dataset_name)))
metadata!(dataset, "RDatasets.jl", (string(package_name), string(dataset_name)))

The main purpose of its existence is to provide a way to display the content
differently in HTML and markdown contexts.

Invoked through [`RDatasets.description`](@ref).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Invoked through [`RDatasets.description`](@ref).
Obtained through [`RDatasets.description`](@ref).

Comment on lines +59 to +60
if "RDatasets.jl" in DataFrames.metadatakeys(df)
package_name, dataset_name = DataFrames.metadata(df, "RDatasets.jl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "RDatasets.jl" in DataFrames.metadatakeys(df)
package_name, dataset_name = DataFrames.metadata(df, "RDatasets.jl")
if "RDatasets.jl" in metadatakeys(df)
package_name, dataset_name = metadata(df, "RDatasets.jl")

@@ -1,12 +1,13 @@
name = "RDatasets"
uuid = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
version = "0.7.7"
version = "0.8.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good occasion to tag 1.0.0. Clearly the package is stable enough.

Suggested change
version = "0.8.0"
version = "1.0.0"

Comment on lines +21 to +22
RDatasets.description(iris) # only use this on DataFrames returned from `dataset`!
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RDatasets.description(iris) # only use this on DataFrames returned from `dataset`!
```
RDatasets.description(iris)
```
Only use the latter on data frames returned from `dataset`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants