Sustainable Test-Driven Development: January 2012

Monday, January 30, 2012

Testing Best Practices: Test Categories, Part 2

download the podcast

Continued from Test Categories, Part 1

5. Behavior with related boundaries (“and”)

Sometimes a behavior varies based on more than one factor. For example, let’s say that in order for the system to allow someone to be hired (returning a true from the canHire() method, perhaps), the individual in question must both be at least 18 years old and must be a US citizen.

Similar to a boundary within a range, here we need more than one assert to establish the rule. However, here 2 is not enough.

Let’s further stipulate that there is a constant, System.MINIMUM_HIRING_AGE, and a enumeration named System.Citizenship with members for various countries. The specifying test code would look like this:

class HiringRuleTest {
   public void testOnlyAppropriateCitizensOfSufficientAgeHired() {
       System.Citizenship anyOtherThanUS =
           System.Citizenship.OUTER_MONGOLIA;

       HiringRule testHiringRule = new HiringRule();

       Assert.True(testHiringRule(System.MINIMUM_HIRING_AGE,
           System.Citizenship.US));
       Assert.False(testHiringRule(System.MINIMUM_HIRING_AGE - 1,
           System.Citizenship.US));
       Assert.False(testHiringRule(System.MINIMUM_HIRING_AGE,
           anyOtherThanUS));
   }
}

A typical question that might occur to you on reading this might be:

Should we not show that a person under the minimum age and also not a citizen of the US will get a false when we call this method? We have shown that:

A US citizen of sufficient age will return a true (hire)
A younger US citizen will not be hired (false)
A non-US citizen “of age” will not be hired (false)

But should we not show that 2 and 3 combined will also produce a false?

This is a perfect example of the difference between testing and specification. If we were to ask an expert in testing this question, they would likely talk about the four possible conditions here as “quadrants” and would, in fact, say that the the test was incomplete without the fourth case.

In TDD, we feel differently. We don’t want to make the test suite any larger than necessary, ever, because we want the tests to run as fast as possible, and also because we don’t want to create excessive maintenance tasks when things change. The ideas of “thoroughness” and “rigor” that naturally accompany the testing thought process become “sufficiency” in TDD. This probably seems like a trivial point, but it becomes less so over the long haul of the development process.

Put another way, we never add a new assert or test unless doing so makes a new distinction about the system that was not already made in the existing assertions and tests. What makes non US citizens distinct from US citizens is that they will get a false when the age is sufficient. That a non-US citizen with insufficient age will get a false is no different than a US citizen, it is the same distinction.

Again, TDD does not replace traditional testing. We still expect that to happen, and still respect the value of testing organizations as much as we ever have.

You may also wonder “why Outer Mongolia to represent ‘some other country’? Why not something else?” We don’t really care what country we choose here, hence we named the temporary method variable “anyOtherThanUS”. Elsewhere in this book, we’ll look at other options for representing “I don’t care” values like this one.

6. Behavior with repeated boundaries (“or”)

Sometimes are have boundaries that are related in a different way. Sometimes it is a matter of selecting some values across a set, and specifying that some of them must be in a given condition. If we don’t address this properly, we can seriously explode the size of our tests.

Let us say, for example, that we have a system of rules for assigning employees to teams. Each team consists of a Salesperson, a Customer Support Representative (CSR), and an Installation Tech. The rule is that no more than one of these people can be a probationary employee (“probie”). It’s okay if none of them are probies, but if one of them is a probie the other two must not be. If we think if this in terms of the various possible cases, we might envision something like this:

This would indicate eight individual asserts. However, if we take a lesson from the previous section on “and” boundaries we note that zero probies is the same behavior (acceptance) as any one probie; it is not a new distinction and we should leave that one off. Similarly, we don’t want to show that all three being probies is unacceptable, since any two are already not acceptable and so, again, this would not be a new distinction.

That makes for six asserts. Doesn’t that feel wrong somehow? Like it s a brute force solution? Yes, and this instinct is something you should listen to. Imagine if the teams had 99 different roles and the rule was that no more than 43 of them could be probies? Would you like to work out the truth table on that? Me neither, but we can do it with math:

That would be a lot of asserts! Painful.

Pain is almost always diagnostic. When something feels wrong, this can be a clue that, perhaps, we are not doing things in the best way. Here the testing pain is suggesting that perhaps we are not thinking of this problem correctly, and that an alternate way of modeling the problem might be more advantageous, and actually more appropriate.

A clue to this lies in the way we might decide to implement the rule in the production code. We could, for instance, do something like this:

class TeamBuilder {
    public const MAX_PROBIES = 1;

    public boolean isTeamProperlyConfigured(
        Employee aSalesPerson,
        Employee aCSR,
        Employee aTech) {

        int count = 0;

        if(aSalesperson.type == “probationary”) count++;
        if(aCSR.type == “probationary”) count++;
        if(aTech.type == “probationary”) count++;

        if(count > MAX_PROBIES) return false;

        return true;

}

}

This also seems like an inelegant, brute-force approach to the problem, and we might refactor it to be something like this:

class TeamBuilder {
    public const MAX_PROBIES = 1;

    public boolean isTeamProperlyConfigured(
        Employee aSalesPerson,
        Employee aCSR,
        Employee aTech) {

        int count = 0;
        Employee[] emps = new Employee[]

{aSalesPerson, aCDR, aTECH};

        forEach(Employee e in emps)
            if(e.type == “probationary”) count++;

        if(count > MAX_PROBIES) return false;

        return true;

}

}

In other words, if we stuff all the team member into a collection, we can just scan it for those who are probies and count them. This suggests that perhaps we should be using a collection in the first place, in the API of the object:

class TeamBuilder {
    public const MAX_PROBATIONARY = 1;

    public boolean isTeamProperlyConfigured(Employee[] emps) {
        int count = 0;
        forEach(Employee e in emps)
            if(e.type == “probationary”) count++;

        if(count > MAX_PROBATIONARY) return false;

        return true;

}

}

Now we can write a test with two asserts: one that uses a collection with just enough probies in it and one with MAX_PROBATIONARY + 1 probies in it. The boundary is easy to define because we’ve changed the abstraction to that of a collection.

Here is another example of a test being helpful. The fact that a collection would make the testing easier is the test essentially giving you “design advice”. Making thing easier for tests is going to tend to make things easier for client code in general since, in test-first, the test is essentially the first client that a given object ever has.

7. Technically-induced boundaries

Addition is an example of a single-behavior function:

val = op1 + op2

As such, there is little that needs to be specified about it. This, however is the mathematical view of the problem. In reality there’s another issue that presents itself in different ways -- the bit-limited internal representation of numbers in a computer or the way a specific library that we consider using is implemented.

Let’s assume, for starters, that op1, op2 and val are 32 bit integers. As such there are a maximal and minimal values that the can take: 231-1 and -231. The tests needs to specify the following:

What are the largest positive and negative numbers that can be passed as arguments? The boundary in this case is on the “value of the operand”
What happens if the sum of the arguments is more, or less than these technical limits? (231-1) + 1 = ?

The boundary in this case is on the “sum of the operands”

When it comes to floating point calculation the problem is further complicated by precision both is the exponent and the mantissa. For example:

100000000000.0 + 0.00000000001 = 100000000000.00000000001

In a computer, because of the way floating point processors operate, the calculation may turn out differently:

100000000000.0 + 0.00000000001 = 100000000000.0

Note that the boundary in this case is on the “difference between the operands”

In many cases the hardware implementation will satisfy the needs of our customer. If it does not, this does not mean we cannot solve the problem. It means that we will not be able to rely on the the system’s software or hardware implementation but would need to roll our own. Surely this is a tidbit of information worthwhile knowing prior to embarking on implementation?

Continued in Test Categories, Part 3...

Wednesday, January 11, 2012

Testing Best Practices: Test Categories, Part 1

Download the Podcast

Successfully adopting and practicing TDD in a sustainable way involves many distinctions, best-practices, caveats, and so forth. One way to make such information accessible is to put in into a categorized context. The Design Patterns, for instance, are often categorized into behavioral, structural, and creational.[1] Here we will do a similar thing with the executable specifications (“tests”) we write when doing TDD.

We have identified four categories of unit tests, namely: functional, constant specification, creational, and work-flow. We’ll take them one at a time.

Functional

The first unit test a developer ever writes is often an assertion against the return of a method call. This is because systems often operate by taking in parameters and producing some kind of useful result. For example, we might have a class called InterestCalculator with a method called CalcInterest() that takes some parameters (a value, a rate, a term, and perhaps the month to calculate for) and then returns the proper interest to charge or pay, depending on the application context.

The primary way of creating useful behavior in software is, in fact, in writing such methods. However, how we test them will depend on the nature of the behavior. We can, therefore, further sub-divide the ‘Functional’ category into the following types:

1. Static behavior

This is the simplest. If a method produces a simple, non-variant behavior, then we simply need to pick some parameters at random, call the method, and assert that the result is correct. For example:

// pseudocode
class Calculator {
    public int add(int x, int y) {
        return x + y;
    }
}

// pseudotest
class CalculatorTest {
    public void testAddBehavior() {
        int anyX = 6;
        int anyY = 5;
        int expectedReturn = 11;

        Calculator testCalculator = new Calculator();
        int actualReturn = testCalculator.add(anyX, anyY);
        assertEqual(expectedReturn, actualReturn);
    }
}

Adding two numbers always works the same way, so all we need is a single assertion to demonstrate the behavior in order to specify it. Note that we have named our temporary variables in the test anyX and anyY to make it clear that these particular values (5 and 6, respectively) are not in any way important, that the test is not about these values in particular. The test is about the addition behavior, as implemented by the add()method. We simply needed some input parameters in order to get the method to work, and so we picked arbitrary (any) values for our test. [2]

This is important, because we want it to be very easy for someone reading the test to be able to focus on the important, relevant part of the test and not on the “just had to do this” parts. Here again, thinking of this as a specification leads us to this conclusion.

Static behavior is the same for all values of all parameters passed. For example, f() here takes a single parameter, while g() takes two. But for all values of these parameters, the behavior is the same and so we pick "any" values to demonstrate this.

2. Singularity

If a behavior is always the same (static) except for one particular condition where it changes, we call this condition a singularity.

The classic example is divide-by-zero. In division, the behavior is always the same unless the divisor is zero, in which case we need to report an error condition. Here we’d need two assertions: one, like the one for static behavior, would pick ‘any’ two numbers but where the second is non-zero, then show the division, then another that shows the error report when the second number is zero.

It does not, of course, have to be a mathematical issue: it could be a business rule.   Let’s say, for example, that we charge a fee of $10 for shipping unless it is the first day of the month when we ship for free. We’re trying to encourage sales at the beginning of the month. Thus, the first day would be the singularity, and we’d write this test [3]:

public void testShippingIsFreeOnTheFirstDayOfTheMonth() {
    ShippingCalc shippingCalc = new ShippingCalc();
    int anyDateOtherThanTheFirst = 5;
    const int FIRST_DAY_IN_MONTH = 1;
    amount expectedStandardFee = ShippingCalc.STANDARD_FEE;
    const amount FREE = 0.00;

    Assert.AreEqual(expectedStandardFee,
      shippingCalc.getFee(anyDateOtherThanTheFirst);
    Assert.AreEqual(FREE,
      shippingCalc.getFee(FIRST_DAY_IN_MONTH);
}

Note the use of the term “any” for the date we “don’t care about, they’re all the same”, which we call anyDateOtherThanTheFirst , and then the fact the FIRST_DAY_IN_MONTH is clearly special.

Another example would be choosing a specific behavior for one element in a set. For example if some function is legal only for one type of user, and all other types should get an exception:

…
enum User = {REGULAR, ADMIN, GUEST, SENIOR, JUNIOR, PET}
…

public void testOnlyAdminCanGetCoolStuff() {
    StuffGetter getter = new StuffGetter();
    Stuff stuff;

    int anyNonAdmin = Users.REGULAR;
    try {
        stuff = getter.getCoolStuff(anyNonAdmin);
Assert.Fail("Cool stuff should go to ADMIN only");
} catch (PresumptionException) {}
    stuff = getter.getCoolStuff(User.ADMIN);
    Assert.True(stuff.IsCool());
}

Two examples.... f() with it's single parameter provides the same behavior for all values but one... the point indicated. With the two parameters g() takes, the singularity may involve them both, creating a point, or it may only pertain to one, creating a line. For instance, if x is "altitude" and y is "temperature" then a point might indicate "same behavior for all values except 3000 feet and 121 degrees. The line might indicate "the same behavior for all values except 2000 feet at any temperature".

3. Behavior with a boundary

Sometimes the behavior of a method is not always uniform, but changes based on the specific parameters it is passed. For example, let’s say that we have a method that applies a bonus for a salesperson, but the bonus is only granted if the sale is above a certain minimum value, otherwise it is zero. Further, the customer tells us that pennies don’t count, the sale must be an entire dollar over the minimum sales value.:

In this case there exists a special sales amount, which affects the behavior of the getBonus() function. We need to specify this boundary -- the place where the behavior changes -- and since every boundary has two sides, we need to explicitely specify these values and relate them:

class SalesApplicationTest {
public void testBonusOnlyAppliesAboveMinumumSalesForBonus() {
    double maxNotEligibleAmount =
      SalesApplication.BONUS_THRESHOLD + .99;
    double minEligibleAmount =
      SalesApplication.BONUS_THRESHOLD + 1.00;
    double expectedBonus = minEligibleAmount *
      SalesApplication.CURRENT_BONUS
    SalesApplication testSalesApp = new SalesApplication();

    AssertEqual(0.00,
      testSalesApp.applyBonus(maxNotEligibleAmount);
    AssertEqual(expectedBonus,
      testSalesApp.applyBonus(minEligibleAmount);
    }
}

This specifies, to the reader, that the point of change between no bonus and the bonus being applied is at the BONUS_THRESHOLD value, and also (per the customer) that the sale must be a full dollar above the minimum before the bonus will be granted. This is called the epsilon, the atom of the change, and you’ll note that we are clearly demonstrating it as one penny, the penny that takes us from 99 cents over the minimum to 1 full dollar over it.

One might be tempted to assert against other values, like 200 dollars over the minimum, or .32 cents above it, or loop through all possible values above and below the transition point. Or to pick “any” value above and “any” value below. The point is that .99 cents and 1 dollar are significant amounts over the minimum, they matter to the customer, and so we need to specify them as unique.

We also want our tests to run fast, and so looping though all possible values is not only unnecessary, it is counter-productive.

Two points define the boundary where behavior changes, and we also demonstrate the epsilon (or atom) of change.

4. Behavior within a range

There can be, of course multiple boundaries that change behavior. If these boundaries are independent of each other, then we call this a range.

For example, let us say that the acceptable temperature of an engine manifold must be between 32.0 and 212.00 degrees Fahrenheit (too cold, and the engine freezes, too hot and it overheats). These are not related to each other (we could install anti-freeze to make lower temperatures acceptable while the upper limit might not change, or vice-versa using coolant), and so each would be specified with two asserts, one at and one above the boundary in each case.

But let’s not forget the epsilon! How much is “over” or “under”? One degree? One tenth of a degree? Ten degrees? How sensitive should this system be? Here again, this is a problem domain specification, and thus we have to know what the customer wants before we can create the test.

Also, note that whereas for integers the natural epsilon is 1, for floating point numbers that epsilon value depends on the base number. The larger it is, the larger the epsilon needs to be. Constants such as Double.Epsilon only indicate the smallest possible number, not the smallest discernible difference between values.

Two boundaries, with epsilons for each. Note the boundaries of a simple range are not related to each other.

[1] In point of fact, we don’t actually completely agree with this method of categorizing the Design Patterns, but it does serve as a reasonable example of categorization in general.

[2] There are other ways to do this. In another blog we will discuss the use of an “Any” class to make these “I don’t care” values even more obvious.

Continued in Test Categories, Part 2

Net Objectives

Pages

Monday, January 30, 2012

Testing Best Practices: Test Categories, Part 2

Wednesday, January 11, 2012

Testing Best Practices: Test Categories, Part 1