Friday, June 15, 2007

Python's unittest module ain't that bad

Collin Winter was kind enough to speak to BayPiggies last night about his unittest module replacement, test_harness. The basic premise of the talk is that unittest does not support extensions very well, hence he wrote his own testing framework that did. The same argument is presented in Collin's blog posting titled "Python's unittest module sucks".

However, I have an issue with several of Collin's claimed deficiencies in unittest: they simply aren't there. For example, he claimed that extensions cannot be composed (i.e. multiple extensions cannot be applied to a single test case) easily. I raised the point in the meeting that python's decorators are trivially composable and the TODO annotation described in his blog is trivially implementable as a decorator, usable via unittest, nose, or just about any testing framework. In his presentation, he claimed TODO annotations required over a hundred lines of code across 5 classes to implement using unittest. This simply isn't true: I implemented it in 8 lines while he spoke; adding some polish it's up to 11 plus doc-string, but nowhere near 100 and there isn't a class in sight:

def TODO(func):
"""unittest test method decorator that ignores
exceptions raised by test

Used to annotate test methods for code that may
not be written yet. Ignores failures in the
annotated test method; fails if the text
unexpectedly succeeds.
def wrapper(*args, **kw):
func(*args, **kw)
succeeded = True
succeeded = False
assert succeeded is False, \
"%s marked TODO but passed" % func.__name__
wrapper.__name__ = func.__name__
wrapper.__doc__ = func.__doc__
return wrapper

Collin demonstrated a platform-specific test annotation in his framework. He claimed this would require almost 200 lines of code to implement in unittest, but that too is an overstatement. I had it implemented before he could finish the slide:

def PlatformSpecific(platformList):
"""unittest test method decorator that only
runs test method if is in the
given list of platforms
def decorator(func):
import os
def wrapper(*args, **kw):
if in platformList:
return func(*args, **kw)
wrapper.__name__ = func.__name__
wrapper.__doc__ = func.__doc__
return wrapper
return decorator

The point is that python decorators are a language feature that allow you to trivially wrap any callable with another callable; the latter of which can perform any pre- or post- processing or even avoid calling the decorated function at all. You get transparent composition for free:

class ExampleTestCase(unittest.TestCase):
def testToDo(self):

@PlatformSpecific(('mac', ))
def testMacOnly(self):

@PlatformSpecific(('nt', 'ce'))
def testComposition(self):

(If you aren't familar with decorators in python, IBM has a pretty thorough article on the subject)

For the record, I also implemented a proof-of-concept of Collin's reference counting example in a similarly-succinct decorator. In the example presented at BayPiggies, Collin ran the test case 5 times, checking a reference count after each run. I missed how he was getting references counts (len(gc.get_referrers(...)) maybe?) so you need to fill in how to get your object reference counts:

def CheckReferences(func):
def wrapper(*args, **kw):
refCounts = []
for i in range(5):
func(*args, **kw)
assert min(refCounts) != max(refCounts), \
"reference counts changed"
wrapper.__name__ = func.__name__
wrapper.__doc__ = func.__doc__
return wrapper

Adding the repetition count as a parameter would be trivial (see the PlatformSpecific decorator above for an example how). I hard-coded 5 repetitions since that is what Collin used in his presentation.

In all, it took me about 5 minutes to write all three decorators and test them (admittedly, I already had unittest test cases to try my decorators on). I didn't tinker a bit nor was there any poking or prodding of unittest; I didn't subclass anything. You can implement these extensions using standard python syntax and the standard pytingn unittest module. To claim otherwise is simply disingenuous.

Of course, Collin's testframework module uses decorators too, so Collin was clearly aware of their existence. Which prompted me to question Collin's claims of 100+ lines to implement these features using unittest when simple decorators are sufficient. His response was that his numbers were the number of lines of code that would be necessary to implement the TODO and platform-specific annotations using the unittest module without decorators. Which seemed inconsistent with his examples, involving decorators, of how easy it is to use these annotations with test_harness. I wanted to ask him about this contradiction face-to-face after the presentation, but unfortunately he had to catch the Google shuttle home immediately after his talk.

One point that Collin did repeatedly come back to was that logging extensions cannot be implemented using decorators. For example, you cannot have the unittest module log the test run results to a database by wrapping test methods in decorators. In theory you just need to implement your own TestRunner and TestResult subclasses and pass the TestRunner subclass to unittest.TestProgram(). However, if Sebastian Rittau's XML logger TestRunner class for unittest is any indication, changing loggers is non-trivial.

Collin said in his presentation, and I would have to agree, extending unittest logging is painful; composing multiple loggers is prohibitively painful. Of course, if more TestRunner implementations were included in the python standard library, half of this argument would be moot as there would be less need to extend. Right now, only a text logger TestRunner is included.

But to be honest, I don't expect most people really need to replace the logging mechanism (which may be why the standard library doesn't include more loggers). Marking tests as TODO or platform-specific or whatever is pretty universal; recording test run results to a database for analysis is probably far outside the realm of what most people (or even companies outside the Fortune 500 for that matter) need from their test framework. Which may be more of a comment on the sad state of testing than anything, but I digress. In any event, Collin's re-implementation adds value by facilitating logger composition, but to say that not facilitating logger composition makes unittest "suck" seems like a gross overstatement to me.

In all, I left BayPiggies last night having thought a lot more about unittest than I ever have before. And I can't help but think that, for the vast majority of us python hackers down in the trenchs, python's unittest ain't that bad.

Update 2007/06/15 10:05am:
I found the lines-of-code numbers quoted in Collin's presentation in his blog also. My memory was pretty close on the supposed ~100 lines to implement TODO annotations. But it looks like I may have confused his ~200 lines quote for implementing composition of TODO and reference counting with the supposed number of lines to implement platform-specific test annotations. To be clear, though, composition using decorators as I described above requires 0 core classes and 2 lines of code (see my testComposition example above).


musesum said...

Thanks Kelly, for posting this. I was at the baypiggies meeting, heard your questions, and came away convinced that decorators are the simplest way to go.

Kelly Yancey said...

Thanks for the comment! Not to disparage Collin's hard work, but I'm more of a KISS guy myself.

JJ mentioned that nose supports test functions, not just methods, which should make things even simpler.

Anonymous said...

I found these code samples very helpful for work I'd like to do with unittest. What license are you publishing them under?

Kelly Yancey said...

Everything on my blog is BSD-licenced, see the link in the sidebar on the right for the license text.

And I just wanted to say that you just made my day for the being the first person I have ever met actually concerned about the licensing of code posted on peoples' blogs. It warms my heart to know someone is paying attention. Thanks!