Description
"Given a DNA string, compute how many times each nucleotide occurs in the string."
-- nucleotide-count description
This issue is a discussion about what kind of invalid test input it is appropriate to have as part of the canonical-data for an exercise.
Part of the PR "nucleotide-count: refactoring tests (discussion)" (#895) proposed adding tests for various "error" input cases, null
input, strings that contain non ascii characters and other tests that involve strings containing various other non-DNA characters.
In the discussion that arose around that, @Vankog wrote: #895 (comment)
But as far as I experienced it, test suits should check for edge cases. null checks are especially important (at least in almost all languages).
In every educational piece of material the advice is not to only check for usual inputs, but for all possible (equivalent kinds of) inputs. null checks, empty checks and others are always on top of the example lists given. Just take any arbitrary course about QA and this will be part of it.
I think this is especially important, because we are in an educational context. The users should learn from the getgo what it means to write (arguably good) tests (first), because it is still a major problem, even upon seasoned programmers.
From my own business experience I know that new programmers are completely lost when it comes to writing tests. But also experienced programmers fall in the traps of "I'll do it later", "This is a trivial method", "I can't test this", "There are too many dependencies" etc. Just take any talk or tutorial about writing tests or even TDD and you will see the same old counter-arguments or questions again and again from the comments or the audience. This has a reason: Writing good tests is hard! Very hard to be exact and it is a skill that needs exercise. And where better to begin than right from the start?
I agree that writing good tests is hard. One of the traps that is often fallen in to, is testing too much, and the skill is in knowing where to draw the line between what is important to test and what is not important to test.
Part of the challenge of programming is designing clean and self contained abstractions so that you have to worry about as few things as possible at any one time.
The exercises on Exercism help by taking away a lot of the ambiguity and uncertainty by providing small, self contained and well defined exercises, for which the 'interesting' part of the problem is implementing the algorithm to solve a specific problem, rather than working out what the problem is and splitting it up.
In this case, where the problem is "counting nucleotides in a DNA string" it is OK to assume that all the input to the function will be valid DNA strings, and testing against things that are NOT valid DNA strings is inappropriate.
Think of it at it as there was another step prior to every problem that insures that the input is valid. In this case:
Unvalidated input -> DNA Parser -> nucleotide-count
We should be able to trust that the DNA Parser is doing its job and turning whatever it gets as input into a valid DNA string before passing it to nucleotide-count.
Input validation and String parsing are legitimate problems on their own, but they should be split out rather than mixed in to every problem.
I encourage the creation of a 'string-cleaning' exercise that takes all kinds of wacky input values and ensures that the result is a clean string.
But these tests do not belong in nucleotide-count
, or any of the other exercises that happen to take strings as input.
See also: #428 where we discussed whether it was appropriate for every test that handled strings to also have to deal with strings that contained non-ascii characters.