Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature requests #593

Open
2 tasks done
chrisbrownlie opened this issue Feb 5, 2025 · 0 comments
Open
2 tasks done

Feature requests #593

chrisbrownlie opened this issue Feb 5, 2025 · 0 comments
Assignees

Comments

@chrisbrownlie
Copy link

Prework

Proposal

Thanks for the great package! I've been getting to grips with it recently and had some things I wanted to do that don't currently appear to be possible - or at least are not straightforward to do. I've also checked the issues list for all the below and couldn't see anything. For info I'm pretty much solely using the VALID-I workflow as described in the docs.

For ease I've compiled them here in one issue and would greatly appreciate if you could:
(a) let me know if there is already an easy way to do this with {pointblank} that I'm not aware of,
(b) if not, would you consider adding it as a feature, and
(c) if yes to (b), would you welcome a pull request for it, as I'm happy to have a go at doing so

Let me know if you'd prefer me to open separate issues for anything! :)

Validation functions

  • An opposite/complement to conjointly(): i.e. 'if any of these individual checks pass, the overall check passes'. I figure its possible with preconditions/specially() but I would find a separately()/alternatively()/or() type function useful
  • Control over data extracts. One use case is to add a data extract to specially(), but in general it would be good to be able to specify 'this is how to represent failing checks' via an extract_fn argument (or similiar) to validation functions. For another example, I am trying to validate some very wide datasets and it would be useful to only include relevant columns in data extracts so that I can easily see why/where the check failed.
  • Custom metadata for each validation step, i.e. a metadata list argument or similiar to all validation functions which would be added somehow to the agent - either to the validation_set tibble or as a separate element. My use case is to add a custom 'type' to each check, to group them when doing custom reports/investigations.

Reports

  • keep and arrange_by arguments to get_multiagent_report(), behaving the same as the arguments for get_agent_report()
  • Control over display of checks in agent reports. I'd quite like to group some checks together - particularly when its the same check as in the example below - or group passing checks together. For example, below it would be nice if all columns that passed could be combined into a single row in the report - for some wide datasets I want to check >100 columns are all numeric but am only interested in those that fail when I look at the agent report:
library(pointblank)
agent <- iris |>
  create_agent() |>
  col_is_numeric(starts_with("S")) |>
  interrogate()

Created on 2025-02-05 with reprex v2.1.0

Other

  • Some functionality for using a database to track validations. My use case is that we have validations run on survey data - lets say we get tens of responses each week and at the end of each week 'interrogate' the responses we've received. We have easy access to a SQL server database and ideally I'd like to run the {pointblank} interrogations in a CI pipeline which outputs an email of all the failing checks, so my dream is to have a way to: save an agent specification to a table or row in a table; and/or save the results of interrogations as row(s) in a database table (to track changes over time). This is maybe less of a feature request as it might be difficult/beyond the scope of the package but I'd very much welcome any thoughts on how you might go about this!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants