Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutData() refinements #407

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

cutData() refinements #407

wants to merge 8 commits into from

Conversation

jack-davison
Copy link
Collaborator

@jack-davison jack-davison commented Jan 31, 2025

This supersedes #406 but would fix #404 and a bit more.

This is a rewrite for cutData() to make it more modern and less destructive, and clean up legacy code that wasn't going anywhere.

Features

  • Added names which defines the column names appended to the data.

  • Added suffix, an alternative to names, which will append a suffix to any column that would overwrite an existing column.

  • The function will now error if you feed it something that is neither a column nor an in-built function.

  • The function now consistently messages if it filters your data.

Refactoring/Fixes

  • All of the different in-built functions now operate on vectors and exist in child functions. This makes them easier to test, and stops them from being as destructive.

  • We're already importing lubridate, so we're now using lubridate for other stuff (e.g., weekdays). It already deals with things like different weekdays in different locales, for example.

  • Dropped the idea of adding an "openair" source attribute.

  • Dropped the special treatment of "site" - there was seemingly nothing special about it.

  • Dropped documentation references to "ws", which I don't think has ever been treated specially by cutData()

- roxygen2md

- crosslinks

- remove incorrect items
- drop weird dups ("site" and "ws" aren't special)

- use lubridate where possible

- move everything to testable functions
@jack-davison jack-davison changed the title Feat cut data refinement cutData() refinements Jan 31, 2025
@jack-davison
Copy link
Collaborator Author

jack-davison commented Jan 31, 2025

Examples:

devtools::load_all()
#> ℹ Loading openair

dat <- head(openair::mydata) |> dplyr::select(date, no2, o3)

cutData(dat)
#> # A tibble: 6 × 4
#>   date                  no2    o3 default                           
#>   <dttm>              <int> <int> <fct>                             
#> 1 1998-01-01 00:00:00    39     1 01 January 1998 to 01 January 1998
#> 2 1998-01-01 01:00:00    NA    NA 01 January 1998 to 01 January 1998
#> 3 1998-01-01 02:00:00    NA     3 01 January 1998 to 01 January 1998
#> 4 1998-01-01 03:00:00    52     3 01 January 1998 to 01 January 1998
#> 5 1998-01-01 04:00:00    78     2 01 January 1998 to 01 January 1998
#> 6 1998-01-01 05:00:00    42     0 01 January 1998 to 01 January 1998

cutData(dat, names = c("limits"))
#> # A tibble: 6 × 4
#>   date                  no2    o3 limits                            
#>   <dttm>              <int> <int> <fct>                             
#> 1 1998-01-01 00:00:00    39     1 01 January 1998 to 01 January 1998
#> 2 1998-01-01 01:00:00    NA    NA 01 January 1998 to 01 January 1998
#> 3 1998-01-01 02:00:00    NA     3 01 January 1998 to 01 January 1998
#> 4 1998-01-01 03:00:00    52     3 01 January 1998 to 01 January 1998
#> 5 1998-01-01 04:00:00    78     2 01 January 1998 to 01 January 1998
#> 6 1998-01-01 05:00:00    42     0 01 January 1998 to 01 January 1998

cutData(dat, c("o3", "no2", "weekday"), names = c("o3_breaks", "no2_cuts", "wday"))
#> Warning: ! Removing 1 rows due to missing `o3` data.
#> Warning: ! Removing 1 rows due to missing `no2` data.
#> # A tibble: 4 × 6
#>   date                  no2    o3 o3_breaks no2_cuts       wday    
#>   <dttm>              <int> <int> <fct>     <fct>          <ord>   
#> 1 1998-01-01 00:00:00    39     1 o3 0 to 1 no2 39 to 41.2 Thursday
#> 2 1998-01-01 03:00:00    52     3 o3 2 to 3 no2 47 to 58.5 Thursday
#> 3 1998-01-01 04:00:00    78     2 o3 1 to 2 no2 58.5 to 78 Thursday
#> 4 1998-01-01 05:00:00    42     0 o3 0 to 1 no2 41.2 to 47 Thursday

cutData(dat, c("o3", "no2", "weekday"), suffix = c("_cuts"))
#> Warning: ! Removing 1 rows due to missing `o3` data.
#> ! Removing 1 rows due to missing `no2` data.
#> # A tibble: 4 × 6
#>   date                  no2    o3 o3_cuts   no2_cuts       weekday 
#>   <dttm>              <int> <int> <fct>     <fct>          <ord>   
#> 1 1998-01-01 00:00:00    39     1 o3 0 to 1 no2 39 to 41.2 Thursday
#> 2 1998-01-01 03:00:00    52     3 o3 2 to 3 no2 47 to 58.5 Thursday
#> 3 1998-01-01 04:00:00    78     2 o3 1 to 2 no2 58.5 to 78 Thursday
#> 4 1998-01-01 05:00:00    42     0 o3 0 to 1 no2 41.2 to 47 Thursday

cutData(dat, c("o3", "nox"), suffix = c("_cuts"))
#> Warning: ! Removing 1 rows due to missing `o3` data.
#> Error:
#> ✖ type 'nox' is neither a built-in option, nor a column in x.
#> ℹ Built-ins: default, year, hour, month, season, week, weekday, wd, weekend,
#>   monthyear, yearmonth, bstgmt, gmtbst, dst, daylight, seasonyear, and
#>   yearseason
#> ℹ Names in x: date, no2, o3, and o3_cuts

Created on 2025-01-31 with reprex v2.1.1

@jack-davison jack-davison mentioned this pull request Feb 1, 2025
11 tasks
@jack-davison jack-davison added enhancement Ideas for new features for openair utilities 🛠 Openair data utilities, e.g., timeAverage labels Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Ideas for new features for openair utilities 🛠 Openair data utilities, e.g., timeAverage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cutData can overwrite earlier created columns with unexpected data
1 participant