The Other Kelly Yancey: June 2007

Sunday, June 24, 2007

Useless geolocation information?

I just ran across the Windows' geographical location information API hidden amongst their national language support functions. One thing that caught my eye was that you can get latitude and longitude coordinates via the API. For example, in python:


>>> from ctypes import *
>>> nation = windll.kernel32.GetUserGeoID(16)
>>> buf = create_string_buffer(20)
>>> GEO_LATITUDE=0x0002
>>> GEO_LONGITUDE=0x0003
>>> windll.kernel32.GetGeoInfoA(nation, GEO_LATITUDE,
...                             byref(buf), sizeof(buf), 0)
6
>>> buf.value
'39.45'
>>> windll.kernel32.GetGeoInfoA(nation, GEO_LONGITUDE,
...                             byref(buf), sizeof(buf), 0)
8
>>> buf.value
'-98.908'

At first glance, this looks pretty cool. You can get geospacial coordinates on any Windows XP or Vista box. Except that the API only returns the latitude and longitude of the country selected in the "Regional and Language Options" control panel. That is: the coordinates returned are for the user's selected location (i.e. country).

Sure enough, plugging the returned latitude and longitude into Yahoo Maps, I find that the coordinates fall squarely in the middle of the country. A good thousand miles from where I sit in Palo Alto, California. Of course, being that I don't have a GPS receiver attached to my laptop, I didn't really expect the coordinates to be accurate.

For the life of me, I can't think of a single application of this information. But there it is: someone at Microsoft had to populate the database mapping countries to geospacial coordinates and some engineer had to code the API to retrieve them. For what purpose? What can you do with the latitude and longitude of a country (whatever that is supposed to mean)? Can anyone think of a use for this data?

Wednesday, June 20, 2007

"Web 2.0" - Intelligent Design for the Internet

Personally, I hate the term "web 2.0".

Assigning a discrete version number to the web seems to completely miss the point of the evolutionary, organic growth of the World Wide Web that the "2.0" moniker is ostensibly trying to describe. It presumes that there was a clear-cut switch from the "1.0" web and the "2.0" web. There wasn't. That doesn't even make sense: the web isn't some singular entity that is being released by a central authority with new features on a periodic basis. Implying as much just demonstrates you don't know how the web works.

It is the web. There was no 1.0, there is no 2.0, there will never be a 2.1 or 3.0 or whatever. Any more so than there was a life 1.0, life 2.0, life 2.1 or whatever (Second Life jokes aside). Calling it "web 2.0" is the moral equivalent of intelligent design for the Internet. Sure, there are standardization bodies for the protocols the web is built on, but do you really believe the modern web itself can be "best explained by an intelligent cause, not an undirected process such as natural selection"? I don't think so.

This has been out for a few months now, but if you haven't already seen it, a group of Kansas State University students put together a movie that describes the current state of evolution of the web:

Unfortunately, they make the mistake of attaching the 2.0 misnomer to the web. Which is truly a shame since they seem to "get it" in all other respects. The web is us and the last time I checked, there was no us 2.0.

Tuesday, June 19, 2007

Python: property attribute tricks

Here is my twist on the Python Cookbook recipe for class property definition: instead of using a non-standard doc="" variable for passing the documentation string to the property() function, I use a decorator that copies the doc-string from the decorated function and passes that to property() explicitly. First, my decorator:


def prop(func):
  """Function decorator for defining property attributes

  The decorated function is expected to return a dictionary
  containing one or more of the following pairs:
      fget - function for getting attribute value
      fset - function for setting attribute value
      fdel - function for deleting attribute
  This can be conveniently constructed by the locals() builtin
  function; see:
  http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/205183
  """
  return property(doc=func.__doc__, **func())

An example of using this decorator to declare a managed attribute:


class MyExampleClass(object):
  @prop
  def foo():
      "The foo property attribute's doc-string"
      def fget(self):
          return self._foo
      def fset(self, value):
          self._foo = value
      return locals()

The only subtle difference between my method and the property attribute definition method described in the Python Cookbook is that my version allows you to specify document the attribute using python's standard doc-string syntax.

Monday, June 18, 2007

Python 2.5 consumes more memory than 2.4.4

Since python 2.5 contains a new object allocator that is able to return memory to the operating system, I was surprised to find that python 2.5.1 consumes more memory at start-up than 2.4.4 does (at least on Windows). I have been assuming that 2.5.1 could only use the same or less memory than python 2.4.4. However, simply starting each interactive interpreter on my Windows XP machine, the Task Manager reports:

Version	"Mem Usage"	"VM Size"
Python 2.4.4	3440k	1788k
Python 2.5.1	5828k	3868k

The specific builds tested where:

Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on win32
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32

Which were installed using the Windows Installers on the python.org web site as of today.

Of course, depending on your application, I would expect python 2.5.1 to still use less memory overall. But what surprised me was how much of a handicap python 2.5.1 has after just loading the interpreter: it starts out consuming over 2 megabytes more memory.

Friday, June 15, 2007

PowerPoint centimeters

Me, I'm 5'10" tall. That would be approximately 178 centimeters. Or 168 PowerPoint centimeters.

Unfortunately, Google Calculator doesn't recognize PowerPoint centimeters as a unit yet so you'll need to whip out a calculator and do the conversion yourself:
1 inch = 2.4 PowerPoint centimeters
1 centimeter = 1.0583 PowerPoint centimeters

How did this unit come about? From Microsoft's Knowledge Base article 189826:

To improve its usability, PowerPoint slightly misdefines the size of a centimeter to make the invisible gridlines fall at convenient points on the ruler. With this conversion, there are 5 picas per centimeter and the gridlines fall at very convenient points on the ruler. So convenient, in fact, that working in the metric system is really easier than working in the English system.

Apparently, the gridlines in PowerPoint 97 were on pica boundaries, which aligned nicely to the inch ruler since a pica is defined as 1/6 of an inch. But if you were to use PowerPoint 97 in one of those metric-using nations, the pica gridlines didn't fall on convenient points on the centimeter ruler. Solution: redefine centimeters to be an even multiple of picas, which are defined to be an integral fraction of an inch. In other words, Microsoft redefined the meaning of centimeters in PowerPoint such that a centimeter was defined in terms of an inch.

Microsoft had the arrogance to redefine an international standard measurement unit to fit their software's user interface. What makes the arrogance all the more palpable is that their Knowledge Base article acknowledging the existence of PowerPoint centimeters doesn't sound the least bit apologetic for the discrepancy.

Personally, I kind of prefer PowerPoint centimeters to metric centimeters: 168 PowerPoint centimeters tall is a lot easier to remember to an American like me than 177.8 metric centimeters tall is. <wink>

Update 2007/06/15 03:45pm:: My wife pointed out that apparently Microsoft redefined the size of a pica too: their Knowledge Base claims 12 picas to an inch, whereas there are supposed to be only 6 to an inch. I would like to suggest the name "PowerPicas" for these new pint-sized picas. That would make me 840 PowerPicas tall.

Python's unittest module ain't that bad

Collin Winter was kind enough to speak to BayPiggies last night about his unittest module replacement, test_harness. The basic premise of the talk is that unittest does not support extensions very well, hence he wrote his own testing framework that did. The same argument is presented in Collin's blog posting titled "Python's unittest module sucks".

However, I have an issue with several of Collin's claimed deficiencies in unittest: they simply aren't there. For example, he claimed that extensions cannot be composed (i.e. multiple extensions cannot be applied to a single test case) easily. I raised the point in the meeting that python's decorators are trivially composable and the TODO annotation described in his blog is trivially implementable as a decorator, usable via unittest, nose, or just about any testing framework. In his presentation, he claimed TODO annotations required over a hundred lines of code across 5 classes to implement using unittest. This simply isn't true: I implemented it in 8 lines while he spoke; adding some polish it's up to 11 plus doc-string, but nowhere near 100 and there isn't a class in sight:


def TODO(func):
    """unittest test method decorator that ignores
       exceptions raised by test
   
    Used to annotate test methods for code that may
    not be written yet.  Ignores failures in the
    annotated test method; fails if the text
    unexpectedly succeeds.
    """
    def wrapper(*args, **kw):
        try:
            func(*args, **kw)
            succeeded = True
        except:
            succeeded = False
        assert succeeded is False, \
               "%s marked TODO but passed" % func.__name__
    wrapper.__name__ = func.__name__
    wrapper.__doc__ = func.__doc__
    return wrapper

Collin demonstrated a platform-specific test annotation in his framework. He claimed this would require almost 200 lines of code to implement in unittest, but that too is an overstatement. I had it implemented before he could finish the slide:


def PlatformSpecific(platformList):
    """unittest test method decorator that only
       runs test method if os.name is in the
       given list of platforms
    """
    def decorator(func):
        import os
        def wrapper(*args, **kw):
            if os.name in platformList:
                return func(*args, **kw)
        wrapper.__name__ = func.__name__
        wrapper.__doc__ = func.__doc__
        return wrapper
    return decorator

The point is that python decorators are a language feature that allow you to trivially wrap any callable with another callable; the latter of which can perform any pre- or post- processing or even avoid calling the decorated function at all. You get transparent composition for free:


class ExampleTestCase(unittest.TestCase):
    @TODO
    def testToDo(self):
        MyModule.DoesNotExistYet('boo')

    @PlatformSpecific(('mac', ))
    def testMacOnly(self):
        MyModule.SomeMacSpecificFunction()

    @TODO
    @PlatformSpecific(('nt', 'ce'))
    def testComposition(self):
        MyModule.PukePukePuke()

(If you aren't familar with decorators in python, IBM has a pretty thorough article on the subject)

For the record, I also implemented a proof-of-concept of Collin's reference counting example in a similarly-succinct decorator. In the example presented at BayPiggies, Collin ran the test case 5 times, checking a reference count after each run. I missed how he was getting references counts (len(gc.get_referrers(...)) maybe?) so you need to fill in how to get your object reference counts:


def CheckReferences(func):
    def wrapper(*args, **kw):
        refCounts = []
        for i in range(5):
            func(*args, **kw)
            refCounts.append(XXXGetRefCount())
        assert min(refCounts) != max(refCounts), \
               "reference counts changed"
    wrapper.__name__ = func.__name__
    wrapper.__doc__ = func.__doc__
    return wrapper

Adding the repetition count as a parameter would be trivial (see the PlatformSpecific decorator above for an example how). I hard-coded 5 repetitions since that is what Collin used in his presentation.

In all, it took me about 5 minutes to write all three decorators and test them (admittedly, I already had unittest test cases to try my decorators on). I didn't tinker a bit nor was there any poking or prodding of unittest; I didn't subclass anything. You can implement these extensions using standard python syntax and the standard pytingn unittest module. To claim otherwise is simply disingenuous.

Of course, Collin's testframework module uses decorators too, so Collin was clearly aware of their existence. Which prompted me to question Collin's claims of 100+ lines to implement these features using unittest when simple decorators are sufficient. His response was that his numbers were the number of lines of code that would be necessary to implement the TODO and platform-specific annotations using the unittest module without decorators. Which seemed inconsistent with his examples, involving decorators, of how easy it is to use these annotations with test_harness. I wanted to ask him about this contradiction face-to-face after the presentation, but unfortunately he had to catch the Google shuttle home immediately after his talk.

One point that Collin did repeatedly come back to was that logging extensions cannot be implemented using decorators. For example, you cannot have the unittest module log the test run results to a database by wrapping test methods in decorators. In theory you just need to implement your own TestRunner and TestResult subclasses and pass the TestRunner subclass to unittest.TestProgram(). However, if Sebastian Rittau's XML logger TestRunner class for unittest is any indication, changing loggers is non-trivial.

Collin said in his presentation, and I would have to agree, extending unittest logging is painful; composing multiple loggers is prohibitively painful. Of course, if more TestRunner implementations were included in the python standard library, half of this argument would be moot as there would be less need to extend. Right now, only a text logger TestRunner is included.

But to be honest, I don't expect most people really need to replace the logging mechanism (which may be why the standard library doesn't include more loggers). Marking tests as TODO or platform-specific or whatever is pretty universal; recording test run results to a database for analysis is probably far outside the realm of what most people (or even companies outside the Fortune 500 for that matter) need from their test framework. Which may be more of a comment on the sad state of testing than anything, but I digress. In any event, Collin's re-implementation adds value by facilitating logger composition, but to say that not facilitating logger composition makes unittest "suck" seems like a gross overstatement to me.

In all, I left BayPiggies last night having thought a lot more about unittest than I ever have before. And I can't help but think that, for the vast majority of us python hackers down in the trenchs, python's unittest ain't that bad.

Update 2007/06/15 10:05am:: I found the lines-of-code numbers quoted in Collin's presentation in his blog also. My memory was pretty close on the supposed ~100 lines to implement TODO annotations. But it looks like I may have confused his ~200 lines quote for implementing composition of TODO and reference counting with the supposed number of lines to implement platform-specific test annotations. To be clear, though, composition using decorators as I described above requires 0 core classes and 2 lines of code (see my testComposition example above).

Sunday, June 10, 2007

Keyword arguments in XML-RPC

This isn't the least bit novel, for example I know I've been using this trick for years, but nonetheless here is a way to simulate named arguments in XML-RPC. XML-RPC only natively supports positional parameters, but by passing a single positional argument that is itself an XML-RPC struct (which is actually a mapping), you can simulate named and/or optional arguments. Rather than reproduce a sample XML-RPC document demonstrating this usage, I'll refer you to one of my earlier posts that utilized this technique; you'll see that the method is called with two named parameters: path and args.

If you are familiar with perl, you may also be aware of the trick perl 5, which also only natively supports positional arguments, uses to simulate named parameters. In perl 5, it is common to pass a hash of name/value pairs as arguments. However, what perl actually does under the scenes, and which is different from this XML-RPC trick, is to serialize the hash into an array of alternating names and values; it then passes this array as the positional argument list for the subroutine being called. The called subroutine then de-serializes the name and value pairs from the argument array, reconstructing the original hash. This flattening of a hash has to be a documented protocol between the subroutine and its callers.

Of course, you could do exactly the same thing using XML-RPC: serialize the argument dictionary into an array of alternating names and values and populate the method's param list with this array's elements. The XML-RPC server method could then reconstruct the original dictionary from the param list.

But XML-RPC also supports passing dictionaries and dictionaries: using the struct data type. Hence my original suggestion. Since to support named (or generic optional arguments) we have to document a protocol between the caller and the method, we might as well make the protocol as straightforward as possible. Rather than serialize and deserialize a dictionary of named arguments, just pass the dictionary as-is, as the one and only positional argument.

Friday, June 1, 2007

C: Converting `struct tm` times with timezone to `time_t`

Both the BSD and GNU standard C library have extended the struct tm to include a tm_gmtoff member that holds the offset from UTC of the time represented by the structure. Which might lead you to believe that mktime(3) would honor the time offset indicated by tm_gmtoff when converting to a time_t representation.

Nope.

mktime(3) always assumes the "current timezone" defined by the executing environment. Since ISO C and POSIX define the semantics for mktime(3) but neither defines a tm_gmtoff member for the tm structure, not surprisingly mktime(3) does not honor it.

So, lets say you have a struct tm, complete with correctly-populated tm_gmtoff field: how do you convert it to a time_t representation?

Many modern C libraries (including glibc and FreeBSD's libc) include a timegm(3) function. No, this function doesn't honor tm_gmtoff either. Instead, gmtime(3) converts the struct tm to a time_t just like mktime(3), but ignores the timezone of the executing environment and always assumes GMT as the timezone.

However, if your libc implements both tm_gmtoff and timegm(3) you are in luck. You just need to use timegm(3) to get the time_t representing the time in GMT and then subtract the offset stored in tm_gmtoff. The tricky part is that calling timegm(3) will modify the struct tm, clearing the tm_gmtoff field to zero (at least it does on the FreeBSD 4.10 machine I'm testing with). Combined with C's lack of guaranteed left-to-right evaluation, you need to save the tm_gmtoff so it doesn't get clobbered before you can use it. Something like:


time_t
tm2time(const struct tm *src)
{
       struct tm tmp;

       tmp = *src;
       return timegm(&tmp) - src->tm_gmtoff;
}

Note that I copy the entire struct tm into a temporary variable. This prevents timegm(3) from clobbering the tm_gmtoff so that we can use it to accurately compute the seconds since the epoch. The copy in tmp gets clobbered, but the copy in src is left intact. Also, by copying the src struct tm into a temporary, we never modify the argument passed in -- which is just a generally friendly thing to do.

All that said, the truly pedantic will point out that neither ISO C nor POSIX specs dictate that time_t must represents seconds. However, since we are already depending on two non-standard extensions, it seems reasonable to also depend on the fact that systems implementing timegm(3) and the tm_gmtoff field all implement time_t values in seconds.

The Other Kelly Yancey