Friday, December 5, 2014

The Four Steps to Attaining Test Writing Nirvana

As you may have noticed, I have written several posts about testing. A big reason for this is that writing good tests is hard. And more importantly, no matter what software project you are working on, tests are a necessity. They are one of the few universal truths of software engineering.

Anyone that has started up a project will know that the biggest hurdle to testing is writing tests. That may sound silly, but writing tests takes time, effort, and practice. Furthermore, until there is a good testing infrastructure in place, writing tests can be painstakingly slow. As such, the tests that are written at the beginning of a project are usually the most brittle, cumbersome to maintain, and difficult to understand.

The problem is that regardless of project scope and domain and irrespective to testing framework or language that you use, testing of software goes through the following phases: manually created objects, fixtures and helper methods, factories, and finally scenarios.

Can one skip steps and head straight to Nirvana? Well, that depends on how much time you are willing to invest when you first start your project. Unfortunately, most of us will want to churn out "real" features as quickly as possible when we set out on our new adventure. Only once something is up and running do we then go back and flush out the details of testing. Of course, by the time you have your first set of features complete, you (and your collaborators) probably don't want to perform a feature freeze to fix your testing environment. Instead, you will keep test writing as a secondary concern and gradually migrate new tests to better ways of writing tests.

Is this bad? Not necessarily. As you embark on a project, the requirements and priorities of development shift. As a budding startup project, you will want to ship something as quickly as possible. This means cranking out as many gizmos, wizbangs, and whositswhatsits as possible. However, as you mature, you will want to be provide a more robust offering to your users. By the time your project has netted you enough money to buy a Ferrari (or a beer if it's open source), you'll want to make sure that everything is thoroughly tested. With respect to test writing, the aim of a software development team should be to get through these phases as quickly as possible (regardless of what those non-techies tell you!).

If you look at a graph of time spent on writing/maintaining tests and maturity of a project, I suspect it will look something like the bell curve. At first, minimal time is spent on testing - you have to push out those features!. Then, the first set of real bugs hit and you start investing in better testing (both in frameworks, testing standards, patterns, infrastructure, etc.). By the end, writing tests will be so easy that, even though you are writing more tests with better coverage, it will take less time to write and maintain them.

Steps to Nirvana

Although these principles apply to any language/framework/etc., I'm going to illustrate the phases with some simple python code that tests the correctness of a function that returns True if two users share a team. The same type of setup can be used to test a REST API or site written with Ruby on Rails, Node.js, PHP, etc.
def is_on_same_team(user1, user2):
    """"Return True if and only if the two users share a team."""
    # Some implementation for this function. Details aren't important.
Note: The python code here is not the most efficient, or always the most pythonic, but was written with clarity in mind for those who haven't used python before. I'm also going to assume that we have some global variables (like db) that we can use to put things into some database.

Phase 1: Manually creating objects

At first, we do what is simplest: we create the models directly and insert them into the database manually. We've all been there. We've all done it. There is no shame in it... well, not too much.
def test_is_on_same_team():

    user1 = User(
        first_name="Sheldon"
        last_name="Cooper"
    )
    db.put(user1)

    user2 = User(
        first_name="Leonard"
        last_name="Hofstadter"
    )
    db.put(user2)

    user3 = User(
        first_name="Tony"
        last_name="Stark"
    )
    db.put(user3)

    team = Team(name="Big Bang Theory Cast")
    db.put(team)

    team_member1 = TeamMember(team_id=team.id, user_id=user1.id)
    team_member2 = TeamMember(team_id=team.id, user_id=user2.id)
    db.put(team_member1)
    db.put(team_member2)

    assertTrue(is_on_same_team(user1, user2)
    assertFalse(is_on_same_team(user1, user3)
Obviously we do this because it's simple, but why is this so bad? There are many reasons why this is a poorly written test, but here are the main points:
  1. We had to write about 20 lines of code just to setup our test. This is both a waste of time, and it's hard for someone to glance at the test and quickly figure out what is going on.
  2. As the models required some values for things like first_name and last_name, we are forced to pick "random" values that have no bearing on the outcome of the test. In other words, if I'm not testing something dealing with a user's name, I shouldn't have to provide it.
  3. What happens if later we decide the middle_name is also a required parameter? Once that happens, we'll have to go through every test that creates a User object and add in a random middle_name. No fun. Trust me.
  4. We have to manually create the team association with the TeamMember object. What happens if in the future we decide that there first needs to be an invitation step that requires some other models to be created? Again, we'll have to find all tests and update them.
  5. Each of the models is manually inserted into the database. (In this example, I am assuming that there is some magic db.put() method that does this for us for any object type). Since we are going to want to add it to some database 95% of the time, why should I have to explicitly write this line of code for each object that is created? Furthermore, if there are foreign key constraints, you have to make sure that you insert the objects in the right order.
  6. We are testing both the positive case (assertTrue and assertFalse) in one test case. For a real test suite, this should be split in to multiple tests, but we're going to leave it like this for the sake of motivating these examples.

Phase 2: Helper Functions & Fixtures

The firs time that you have to refactor your test(s) because one of your models changes, you will quickly create some helper functions that will hide the details of the model creation. For example, here we now assume that we have created three helper methods that help us create users, teams, and team-user associations. This is definitely a step up from before as we have addressed points 2 and 3, but there is still much to be desired.
def test_is_on_same_team():

    user1 = create_user()
    user2 = create_user()
    user3 = create_user()
    db.put(user1)
    db.put(user2)
    db.put(user3)

    team = create_team()
    db.put(team)

    team_member1 = create_team_membership(team, user1)
    team_member2 = create_team_membership(team, user2)
    db.put(team_member1)
    db.put(team_member2)

    assertTrue(is_on_same_team(user1, user2)
    assertFalse(is_on_same_team(user1, user3)

Phase 3: Factories

After having written many helper functions to create simple objects, you'll undoubtedly move on to the next phase where you realize that in most cases there are many dependent objects that usually need to be created. For example, it could be the case that a user always has a profile image (e.g. you need one to sign up for the service). However, since in most of your tests you didn't need the profile image, you never created the model for it. This means, when you do need it (e.g. if you want to do something like user.profile_image.size() you first need to use a helper function to create an image object, and then associate it with the user. Although you can put this kind of logic in your helper functions, many people move on to using object factories (take a look at factorygirl for Ruby and factoryboy for python).

Although this addresses the 5th issue noted above, on the surface we're not much better than in Phase 2 as we still have to specify exactly how the objects should be created (via the use of UserFactory, TeamFactory, etc). On the up-side, we have decoupled the generation of objects for testing with the usage of them in tests. The factories do all of the work of creating complete profiles, teams, etc. Assuming each profile needed a profile image, the UserFactory should also create a ProfileImage object and associate that with the respective user. Furthermore, the use of factories sets us up for the next set of improvements that we can make. And look, we've already shrunk down our test setup code to 6 lines!
def test_is_on_same_team():

    user1 = UserFactory(db).create()
    user2 = UserFactory(db).create()
    user3 = UserFactory(db).create()

    team = TeamFactory(db).create()

    TeamMembershipFactory(db).create(team_id=team.id, user_id=user1.id)
    TeamMembershipFactory(db).create(team_id=team.id, user_id=user2.id)

    assertTrue(is_on_same_team(user1, user2)
    assertFalse(is_on_same_team(user1, user3)

Phase 3.1: Customized Factories

The problem with the above code is that we still have to explicitly create those TeamMembership objects. This logic of adding someone to a team, shouldn't be part of a test as it's not integral to what we are testing. So, the natural thing to do is to pass this sort of data to the factory that creates the team. For example, below we have modified the TeamFactory to take a members parameter that will automatically create the associates for us. If the way that we create associations between users and teams ever changes, we only ever have to update the factory. Oh, and it also saves us several lines of code in each test that creates teams.

At this point, we have solved pretty much all of the issues that were raised in Phase 1. The setup code has been reduced to 4 lines of code and all of the complexities of object generation, team associations, etc. have been moved to the factories. You may think that we're done on our path to enlightenment, but we still have a bit farther to go!
def test_is_on_same_team():

    user1 = UserFactory(db).create()
    user2 = UserFactory(db).create()
    user3 = UserFactory(db).create()

    team = TeamFactory(db).create(members=[user1, user2])

    assertTrue(is_on_same_team(user1, user2)
    assertFalse(is_on_same_team(user1, user3)

Phase 3.2: Factory Factory

You will soon realize that having to create and initialize each of the individual factories each time you want to use them (as we did above) is a waste of time and effort. As such, the next step is to create an object that contains all of the initialized factories. For example, a very simplistic way to achieve this is as following:
class Factory(object):
    def __init__(self, db):
        self.user = UserFactory(db)
        self.team = TeamFactory(db)
Although for the sake of clarity in the example below here we create a Factory object in the actual test, a better way to do this is to create this object in the set-up phase of your testing framework (E.g. unittest or nose in python, RSpec in Ruby). That way, you create the factories once when the testing framework starts up, and you can just use them in all of your tests
def test_is_on_same_team():

    factories = Factory(db)
    user1 = factories.user.create()
    user2 = factories.user.create()
    user3 = factories.user.create()
    
    team = factories.team.create(members=[user1, user2])
    
    assertTrue(is_on_same_team(user1, user2)
    assertFalse(is_on_same_team(user1, user3)
It is interesting to note, that at this point we have decoupled the setup and testing even more. Not only are we delegating the creation of objects to a factory, but we are delegating the creation of the factories as well. This way, not only is it easy to change the creation of a specific object (by updating the factory), but it is easy to make sweeping changes to how the factories are instantiated (by changing the factory-factory).

Phase 4: Scenarios

After having decoupled your factory creation to a factory-factory, you will soon realize that even calling the factories explicitly isn't clear or easy enough. In other words, you are still working with how to do something rather than the intent of what you want. As such, we move on to abstracting the setup of a test even further via the use of what I like to call "scenarios". The essence of a scenario is to just create a description of what you expect to be in the database for the use of your test and let some underlying magic make it happen.
def test_is_on_same_team():

    d = scenario({
        'users': ['user1', 'user2', 'user3']
        'teams': [
            ['user1', 'user2']
        ]
    })

    assertTrue(is_on_same_team(d['user1'], d['user2'])
    assertFalse(is_on_same_team(d['user1'], d['user3'])
What you see above is a first pass at setting up such a scenario. We simply pass in the set of users and teams that we want created, and let the scenario generation code take care of calling the appropriate factories to setup the data for us (hence the variable name d). At this point, we have completely decoupled how we generate data for our tests from the actual writing of the test. Not only is the test now easier to read but it's easier to understand. This is because we have broken the test into two parts -- the setup phase where we describe what we intend to use to perform the test, and the actual test of the function. The above test as moved from a prescriptive setup to a descriptive setup.

I won't got into the details, but here is a potential (very simplistic) implementation of the scenario function.
def scenario(definition):

    factories = Factory(db)
    users = {}
    for key in definition['users']:
        users.put(key, factories['user'].create())

    teams = {}
    for key in definition['teams']:
        teams.put(key, factories['team'].create(members=t['members']))

    return {
        'users': users
        'teams': teams
    }
Although I show how to do this by returning a dictionary (aka a hashtable for your non-python folk), this can also be done by returning an actual object letting you get at the values with getters/setters instead of looking it up by key in the dictionary. For example, I find it much cleaner to have code that reads:
assertTrue(is_on_same_team(d.user1, d.user2)
assertFalse(is_on_same_team(d.user1, d.user3)

Phase 4.1: Scenarios DSL

On our path to purity, the next improvement is to remove all of the unnecessary "code-like" attributes of the scenario setup. By using a Domain Specific Language (DSL) we can setup a scenario in plain-text with the same result. For instance, one can now imagine using something like the following:
def test_is_on_same_team():
    d = scenario("""
        Users: user1, user2, user3
        Teams:
            user1, user2
    """)

    assertTrue(is_on_same_team(d.user1, d.user2)
    assertFalse(is_on_same_team(d.user1, d.user3)
By using such a DSL it becomes very clear, even to someone that knows very little programming and/or next to nothing about the underlying system that you have developed, what this test is doing. That is the mark of a good test. Furthermore, since all of the details of setting up the data is relegated to the scenario function, any changes to the underlying system only need to be changed in a relatively few places.

Phase 4.2: Scenario Names

If you find that you have a common set of scenarios that you always use, you can even predefine them in some other file for reuse. For example, it would not be hard to assume that we have tests dealing with teams and users, the scenario that we have is a quite common setup. So, let's say we setup a dictionary with all of these predefined scenarios up as follows:
SCENARIOS = {
    'three users with user1 and user2 sharing a team': """
        Users: user1, user2, user3
        Teams:
            user1, user2
    """
}
Assuming we also modify our scenario function to check for predefined scenarios, we can now update our test to
def test_is_on_same_team():
    d = scenario('three users with user1 and user2 sharing a team')
    assertTrue(is_on_same_team(d.user1, d.user2)
    assertFalse(is_on_same_team(d.user1, d.user3)

Phase 4.3: Scenario Decorators

This part is a bit specific to python, but I presume something similar can be achieved in other languages. In the case of python, we can, via the use of the "magic" of decorators remove the setup logic from the test function itself, and move it outside as a "pre-test" step.
@scenario('three users with user1 and user2 sharing a team')
def test_is_on_same_team(d):
    assertTrue(is_on_same_team(d.user1, d.user2)
    assertFalse(is_on_same_team(d.user1, d.user3)
Although this improvement is basically just some syntactic sugar, the above code now is about as clear as you can get for separating what you are testing (that is_on_same_team works as expected) with what you need to have to perform said test.

At this point, you have attained test writing Nirvana.


Isn't this just Behavior Driven Development?

No. This is a methodology for abstracting away what is necessary for a test from how to perform the test. Behavior Driven Development (BDD) is a way to abstract away the implementation of the testing from what it does via the use of a plaintext feature file. For example, we could convert the above test into a BDD version via the following:

Given three users two of whom share a team
Then the users on the team should be considered on the same team
And the users not on the team should not be considered on the same team

Now, each of these steps would then be implemented in some other file along the lines of:
@given("three users two of whom share a team")
def three_users_two_of_whom_share_a_team():
    d = scenario('three users with user1 and user2 sharing a team')

@then("the users on the team should be considered on the same team")
def users_on_same_team_asserts_true():
    assertTrue(is_on_same_team(d.user1, d.user2)

@then("the users not on the team should not be considered on the same team")
def users_on_different_team_asserts_false():
    assertFalse(is_on_same_team(d.user1, d.user3)
As such, the use of the testing methodology described here to abstract away what you need from what you have to test works perfectly well using BDD as well.

Challenge accepted?

The challenge I put forth to everyone when starting a new project is to not skip on the testing infrastructure early on, but try to make your way through the phases as quickly as possible. This applies to both developer and non-technical person (e.g. CEO, Marketing Exec, etc.) alike. It takes a little bit of forethought, but you will save yourself countless hours of refactoring tests (which is a real time-sink!) and make you a much happier test writer... and everyone that has to review your code will love you for having such easy to read and understand tests.

Tuesday, July 1, 2014

Elementium: Browser testing done right

Browser testing is something that most large web applications have to go through. Most small projects start out with manual testing (i.e. you deploy and then click around the app to make sure nothing is broken) and then progress to automated testing that codifies these tests and automatically runs them on each commit (or you can start them manually before each commit/merge, if there is no continuous integration server setup). One of the major players in providing the framework necessary to perform automated testing is Selenium. More specifically, in the world of python, this is the Selenium python bindings. Note, this entire post, and elementium, is a python library. As such, if you are using a different language, elementium is not for you. Similarly, when I refer to "Selenium" I am referring specifically to the Selenium python bindings.

Everyone's reaction to Selenium is the same. First, it's a sense of awe for how it can navigate around a website, enter text, and click on buttons. This is soon followed by frustration when finding out that unlike most code that has been written to develop the application, testing in browsers is particularly fickle; widgets don't load in the same number of seconds in different runs, and elements become stale when the DOM updates. This leads to a state of despair and generally results in brittle testing code that is littered with time.sleep() calls and with ugly retry logic. This was exactly my progression through the journey of automated browser testing. Fortunately, it didn't end there.

Before I continue, let me be very clear in saying that I am eternally grateful to the folks that created and maintain Selenium - I could not do any of the testing that I do without it. It is a fantastic library that does a fantastic job. It just happens to have been created in an era when dynamic pages littered with AJAX calls was not the norm. Any of this work or potential "criticisms" are meant to be taken with this in mind -- without Selenium, we wouldn't even be this far.

If one takes a look at what people try to accomplish with automated browser testing it is:
  • Select browser elements, and perform actions based on selected elements
  • Assert to insist that particular elements exist
That's about 90% (if not more) of the work. Unfortunately, the base Selenium library does a great job at this if the page is not changing. I.e. as long as you don't have a dynamically loading page, that has widgets appear and get populated asynchronously, the core Selenium driver will serve you well. However, if you have such a dynamic page (or a web application that is completely written in JavaScript using something like Backbone.js), you will (or have already) run into two major obstacles:
  1. Waiting for an element to appear on the page
  2. Handling StaleElementExceptions
The current state of the art solution to this problem (as proposed by the hosted testing service SauceLabs) is to use something called "spin" methods. For example, if you need to assert that a certain element exists on a page, you should use a spin_assertthat spins and waits until the element appears. Here is a modified version of what they present (but the basic idea is exactly the same):
def spin_assert_equal(element, assertion):
    for i in xrange(60):
        try:
            # Re-get the element from the page via the lambda
            # and assert they are equal
            assert element() = assertion
            return
        except Exception, e:
            pass
        sleep(1)
    # If we get here, give it one more try, or make it raise an
    # AssertionError
    assert element() = assertion

# Create a lambda that finds the element
element = lambda: selenium.find_element_by_id('foo').text

# Try the assertion
spin_assert(element, 'FOO')
For full details on their method, check out their post here.

Does the above method work? Of course! But let me ask you this. Which of the following do you think is easier to read, understand, and maintain?
# Option 1: Pure selenium
element = lambda: selenium.find_element_by_id('foo').text
spin_assert(element, 'FOO')

# Option 2: Elementium
elements.find('#foo').insist(lambda e: e.text() == 'FOO')
I'm hoping that you say Option 2. Taking cues from jQuery, Elementium allows you to chain commands and handles all of the automatic retrying for you. For example, you could do something like this if you really wanted to:
elements.\
    find('.foo').\
    filter(lambda e: e.text().startswith('a')).\
    until(lambda e: len(e) == 2).\
    foreach(lamba e: e.click())
This here will find all elements with the CSS class 'foo', filter it down to the elements that have text starting with 'a', insist that there are exactly 2 of them, and then click on them. Yes, all of that in 1 line of code. Oh, and did I mention that this handles all of the retry logic automatically for you? (Btw, you can use the click() method on a list of elements as well elements.find('.foo').click())

How about if you want to wait until there are exactly 3 elements of this type on the page (because, for example, you have made some AJAX calls that creates three notifications)
elements.find('.notification').until(lambda e: len(e) == 3)
That's it. This will retry (for 20 seconds by default) and wait until there are three elements with the CSS class 'nofitication'.

"How does it do all this magic," you ask. I won't go into too much detail here, but under the hood, each selector (e.g. '.foo' or '#foo') is stored as a callback function (similar to the lambda: selenium.find_element_by_id('foo') of the first example. This way, when any of the calls to any of the methods of an element has an expected error (StaleElementException, etc.) it will recall this function. If you perform chaining, this will actually propagate that refresh (called update()) up the entire chain to ensure that all parts of the call are valid. Cool! Ok, and now to a "full" example that shows you exactly how to set everything up so that you can start using this today.
from selenium import webdriver
from elementium.drivers.se import SeElements

# Initialize the elements wrapper 
elements = SeElements(webdriver.Firefox())

# Do cool stuff
elements.find('#foo')
It's as simple as that!

Although this library is still under development, you can get this library and make use of it now by getting it from this public GitHub repo: https://github.com/actmd/elementium. There you will find more usage examples, the code, etc. We have been using it at ACT.md for the past 3 months and it has reduced our testing code, made it more stable, and made it much more legible. I'd say that's a win, win, win situation!

Don't hesitate to let me know if you have any questions, suggestions, etc. Happy testing.

Tuesday, February 18, 2014

Running background tasks with Fabric

Using Fabric to run remote commands when administering servers, is definitely quite a time saver. One of the issues that I keep stumbling across is running background jobs on a remote machine. For example, one of the things we do to test code at work is to create an Amazon AWS instance with the application deployed on it and then run browser integration tests against that instance. If we were to just use the regular run() or sudo() command to run a long running test, we would have to make sure that we don't lose the connection between our local development machine and the AWS instance (otherwise the tests would just stop running). So, clearly, we want to start the long running task and then be able to go on our merry way.

As you may know, there are several ways to start background jobs on a *NIX like machine: nohup, screen, etc. The issue is that running things in the background may cause some issues when using Fabric. Just take a look at the Fabric FAQ that covers this topic, along with this nice discussion.

The easiest solution that I found that works in pretty much all cases is to use the one suggested here using dtach. I slightly extended the suggested solution there, to be a bit more well-rounded and complete. As you can see, this uses apt-get to install dtach, so if you are not running Ubuntu, make sure to update that appropriately.
from fabric.api import run
from fabric.api import sudo
from fabric.contrib.files import exists


def run_bg(cmd, before=None, sockname="dtach", use_sudo=False):
    """Run a command in the background using dtach

    :param cmd: The command to run
    :param output_file: The file to send all of the output to.
    :param before: The command to run before the dtach. E.g. exporting
                   environment variable
    :param sockname: The socket name to use for the temp file
    :param use_sudo: Whether or not to use sudo
    """
    if not exists("/usr/bin/dtach"):
        sudo("apt-get install dtach")
    if before:
        cmd = "{}; dtach -n `mktemp -u /tmp/{}.XXXX` {}".format(
            before, sockname, cmd)
    else:
        cmd = "dtach -n `mktemp -u /tmp/{}.XXXX` {}".format(sockname, cmd)
    if use_sudo:
        return sudo(cmd)
    else:
        return run(cmd)

Capturing the output

Although the above snippet works perfectly fine if you are just running a background task and then wanting to forget about it, what happens if you want to capture the output from some command? The problem is that you can't just redirect the output of dtach like you would with nohup. The simplest solution that I could come up was to make dtach run a bash command and explicitly redirect the output. So, I have another function that helps me accomplish this.
def run_bg_bash(
        cmd, output_file=None, before=None, sockname="dtach", use_sudo=False):
    """Run a bash command in the background using dtach

    Although bash commands can be run using the plain :func:`run_bg` function,
    this version will ensure to do the proper thing if the output of the
    command is to be redirected.

    :param cmd: The command to run
    :param output_file: The file to send all of the output to.
    :param before: The command to run before the dtach. E.g. exporting
                   environment variable
    :param sockname: The socket name to use for the temp file
    :param use_sudo: Whether or not to use sudo
    """
    if output_file:
        cmd = "/bin/bash -c '{} > {}'".format(cmd, output_file)
    else:
        cmd = "/bin/bash -c '{}'".format(cmd)
    return run_bg(cmd, before=before, sockname=sockname, use_sudo=use_sudo)
As you can see, all this does is that it wraps the command with an explicit call to bash which then is the one that interprets the output redirection. That's it! Happy dtaching!