Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests should check for vector parameters #109

Open
xarxziux opened this issue Sep 5, 2018 · 8 comments
Open

Unit tests should check for vector parameters #109

xarxziux opened this issue Sep 5, 2018 · 8 comments

Comments

@xarxziux
Copy link
Contributor

xarxziux commented Sep 5, 2018

Having completed all the core exercises and most of the side exercises on the R track, it's become clear to me that while Exercisms testing model works well enough for general-purpose languages, it breaks down for languages that do things a little differently like R. The general design of the tests is that they assume that the student is going to produce a function that accepts a single value. To test several input values, the student's function is called several times with different single values.

This works fine for a general-purpose language like Python or JavaScript, but it breaks down for R which has no concept of single values. Instead, a declaration like x <- 1 does not create an integer or number value, but a numeric vector of size 1. This is a very important concept in R and one that needs to be emphasised to students of the language. I didn't have a firm grasp of this concept myself until recently, despite using R in a professional capacity for some time. Unfortunately the unit tests as they stand don't require the student to "think in R" and allow them to assume that the input values are single-element vectors. I've come to realise while working through the track that I've been guilty of this myself.

The solution to this is quite straightforward: in each exercise, in addition to the current tests, there should be one final test that takes all the values from the previous tests, combines them into a vector (or list, if appropriate) and sends that vector or list to the students function. This will check if they've made the mistake of assuming that the input is a single value, or if they've respected the R way of doing things. This may well break existing solutions (including most of mine), but I think it's a change worth making.

I'm happy to submit the PR's for this myself, but I'd like to see if you agree with my observations and think this is worth doing first.

@jonmcalder
Copy link
Member

jonmcalder commented Sep 5, 2018

Thanks for the feedback!

This is an interesting point that you raise and so I get where you are coming from. I'm not entirely sure I agree with the conclusion that you're jumping to here though (i.e. to extend all the tests to explicitly check for this).

This is a very important concept in R and one that needs to be emphasised to students of the language.

I agree that this is an important concept, but I also want to be sensitive to the spectrum of users of R. Often newer users of R are not necessarily aiming to be R developers (although that may change over time). Maybe for the audience of Exercism users learning/practicing R they will typically tend to be more towards the developer end of the spectrum (especially if they know other programming languages already, but this is not always the case).

My gut feel is that maybe this concept of things in R generally being vectorized by default should be made explicit later on in the R track and thus only introduced into the later tests? Alternatively maybe even just making these optional extra tests to pass (maybe with motivations in one or two key exercises as to why this concept is important/valuable for R developers).

@katrinleinweber, @jkanche & @amoradell what are your thoughts?

@katrinleinweber
Copy link
Contributor

katrinleinweber commented Oct 5, 2018

Having never encountered this issue myself, I'm unsure about this. It's probably best to proceed with a small PR to a late exercise for us to test.

I want to find out what the consequences are. Does solution code has to be bloated to handle this idiomatically? Should we consider that last test as advanced and skip it by default (see exercism/exercism#856)?

xarxziux added a commit to xarxziux/exercism-r that referenced this issue Oct 6, 2018
As a demonstration of the point I raised in issue exercism#109, here is an update for the non-core Collatz Conjecture exercise.  The current version of this does not test for vector parameters and the example would fail this test.

The following changes have been made:
- An extra test has been added to the unit tests that send a four-element vector to the function.
- In the example, he original function `collatz_step_counter` has been renamed to `collatz_scalar`.
- Also in the example, the line `collatz_step_counter <- Vectorize(collatz_scalar)` has been added.

This is a very minor change to the example, but makes the function a lot more powerful by taking full advantage of R's vectorisation capabilities.  R developers really shouldn't be thinking in single values the way most mainstream programming languages do.
xarxziux added a commit to xarxziux/exercism-r that referenced this issue Oct 6, 2018
As another demonstration of issue exercism#109, I've added a vectorised unit test to the Pangram exercise:
- Add vector test to unit tests.
- Rename `is_pangram` function as `is_pangram_scalar`.
- Create new `is_pangram` function by calling `Vectorize` on `is_pangram_scalar` function.
@katrinleinweber
Copy link
Contributor

katrinleinweber commented Oct 13, 2018

OK, I played around with #111 and Vectorize()a bit and realise that I have probably been limbo-ing around well below idiomatic R myself :-/

Often newer users of R are not necessarily aiming to be R developers (although that may change over time).

Agreed.

Maybe for the audience of Exercism users learning/practicing R they will typically tend to be more towards the developer end of the spectrum (especially if they know other programming languages already, but this is not always the case).

I'm recommending it to novices all the time, mostly to experience TDD and learn the syntax by heart.

maybe even just making these optional extra tests to pass (maybe with motivations in one or two key exercises as to why this concept is important/valuable for R developers).

This is the conclusion I came to as well. However, we'll probably learn very little from learners about they how master that test if we make it optional. Thus, I think we should accept the code of #111 as it is, but update the PR with some more explanations, see https://github.com/exercism/r/pull/111/files#r224958701.

@xarxziux
Copy link
Contributor Author

Having given this some thought, I'm coming to the conclusion that automatic vectorisation is a fundamental concept in R that shows up in many different ways. I've only recently had this epiphany, but it's help me clear up some confusion about how R works internally. For example, I could never quite understand why strsplit() returned a list instead of a character vector, but this makes total sense when you consider that it presumes that the first parameter is a string vector and not a string.

Rather than treating it as an advanced concept to be learned later in the track, I think there's a case to be made for considering it a fundamental concept that needs to be emphasised from the beginning. However I take your point that maybe not everyone on the track is planning to be an R developer and this may be a little too advanced for them.

As a compromise, I would like to suggest adding this concept to the unit tests for all exercises, but disabling it by default (which I think can only be done by commenting it out) and offering it as an optional challenge. Rather than update all the descriptions, perhaps a single explanation on the Leap exercise about the bonus test would suffice, along with a short, standard comment in the unit tests about the last test being a bonus, optional one.

@katrinleinweber
Copy link
Contributor

katrinleinweber commented Oct 14, 2018

I take your point that maybe not everyone on the track is planning to be an R developer and this may be a little too advanced for them.

Above, I meant to agree with the 1st part here ;-) I think that if the concept is fundamental(ly useful), exercism should emphasise it, not label it as a special interest somehow.

… adding this concept to the unit tests for all exercises, but disabling it by default

This would even be in line with some many other tracks have all but the 1st test disabled, so learners need to actually go into the test code and enable them step by step, mimicking the TDD workflow. See exercism/exercism#856 & issues linked therein.

single explanation on the Leap exercise … along with a short, standard comment in the unit tests about the last test being a bonus, optional one.

OK. I'm not opposed to merging #111 right away. We could collect learner feedback there and aligning the files it changes now, with your more general approach later.

@jonmcalder
Copy link
Member

Sorry for not weighing in on this again sooner, I did re-read the thread a while back but since it looked like the conversation was progressing I didn't feel the need to follow up.

I am in agreement with the suggested approach (i.e. aim to add test cases for vectorised inputs to the unit tests for all exercises over time, but starting with some of the later exercises first so as to minimize any negative impact on newer users while we explore the implications of carrying this concept through across the rest of the track (and maybe even collect some feedback via the mentoring process).

As an aside to this, just to illustrate that the concept of vectorised functions in R can in fact surface really early on in the track (if one chooses to acknowledge it rather than just ignoring the potential subtlety), see my suggested mentor notes for leap here:
exercism/website-copy#682

Thanks again for raising this important issue @xarxziux and sorry for the lack of progress on it so far.

jonmcalder pushed a commit that referenced this issue Jan 17, 2019
As a demonstration of the point I raised in issue #109, here is an update for the non-core Collatz Conjecture exercise.  The current version of this does not test for vector parameters and the example would fail this test.

The following changes have been made:
- An extra test has been added to the unit tests that send a four-element vector to the function.
- In the example, he original function `collatz_step_counter` has been renamed to `collatz_scalar`.
- Also in the example, the line `collatz_step_counter <- Vectorize(collatz_scalar)` has been added.

This is a very minor change to the example, but makes the function a lot more powerful by taking full advantage of R's vectorisation capabilities.  R developers really shouldn't be thinking in single values the way most mainstream programming languages do.
@katrinleinweber
Copy link
Contributor

Ooops, sorry! I thought that #110 was finalised, but reading it again, I'm sure if we are waiting for tests still, or not?!

@jonmcalder
Copy link
Member

#110 is not directly related to this. I think @xarxziux was just testing out his Node script which picked up a missing test case for the acronyms exercise. That made me realize that there was another test case missing (presumably it had since been added), but #110 could just be merged as is I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants