Sunday, April 15, 2007

Python: The anti-snippet

This past Thursday was snippets night at the BayPiggies meeting. Unfortunately, my laptop's battery died before I was able to present my snippet. I call it an "anti-snippet" since I start with a terrible, terrible piece of code and show various ways to clean it up (the best I can) using python. In this case, I started with an implementation of the key hiding algorithm used by the Wireless Zero Config service on Windows Mobile 5. Here is a straightforward port of the original C sample code:

def _hideKeyMaterial(str):
# This gem comes directly from the wzctool source code.
# I like how they pointlessly obfuscated their own code
# by storing the XOR values out-of-order in the
# chXORData array. Nobody does security like
# Microsoft.
if str is None:
str = ''
assert len(str) <= 32
# Pad out to WZCCTL_MAX_WEPK_MATERIAL (32) bytes.
str = str.ljust(32, '\0')
chXORData = [0x56, 0x09, 0x08, 0x98, 0x4D, 0x08, 0x11,
0x66, 0x42, 0x03, 0x01, 0x67, 0x66]
return ''.join([ chr(ord(c) ^ chXORData[(i * 7) % 13])
for (i, c) in enumerate(str) ])

First, I'd like to point out how silly this hiding logic is. As the comments say, the bytes of the chXORData array are pointlessly out-of-order; this adds nothing to security and just obfuscates the logic. Speaking of security, the perported purpose for hiding the key data in first place is to prevent memory scans from extracting the key material. However, since the routine pads the key material out of 32 bytes before XORing it with known values, this "hiding" algorithm actually makes identifying the key data in memory trivial: you would only need to search for a string of values from the chFakeKeyMaterial array (in their real order). For example, finding the values 0x03, 0x98, 0x01, 0x4D, 0x67, 0x08, 0x66, 0x11 in memory would give you a pretty good chance of having found the key data. Then you just need to apply the hiding algorithm a second time (since it is a simple XOR-based hiding algorithm) to recover the key material. Ironically, a search for a run of nul values in memory would lead to a lot more false matches so "hiding" the key actually makes it easier to find.


Anyway, the demerits of the algorithm aside, lets look at how we can improve the implementation using python. First, the most obvious improvement we can make is to remove the pointless rearrangement of the bytes in the the XOR data array:

def _hideKeyMaterial2(str):
str = str.ljust(32, '\0')
chXORData = [0x56, 0x66, 0x09, 0x42, 0x08, 0x03, 0x98,
0x01, 0x4D, 0x67, 0x08, 0x66, 0x11]
return ''.join([ chr(ord(c) ^ chXORData[i % 13])
for (i, c) in enumerate(str) ])

Personally I find this to be the easiest-to-read version, but it isn't really much of an improvement. We could have done at much in C; in fact, I have to wonder why the programmers who wrote the original C code in the Microsoft's reference wzctool didn't do as much.


There isn't much we can do about the chr() and ord() calls since python, unlike C, treats integers and characters as separate types and XOR is not defined for characters.


We can eliminate the itermediate list by replacing the list comprehension with a generator expression instead:

def _hideKeyMaterial3(str):
str = str.ljust(32, '\0')
chXORData = [0x56, 0x66, 0x09, 0x42, 0x08, 0x03, 0x98,
0x01, 0x4D, 0x67, 0x08, 0x66, 0x11]
return ''.join(( chr(ord(c) ^ chXORData[i % 13])
for (i, c) in enumerate(str) ))

Using the itertools module, we can re-write the algorithm as:

from itertools import repeat, izip
def _hideKeyMaterial4(str):
str = str.ljust(32, '\0')
chXORData = [0x56, 0x66, 0x09, 0x42, 0x08, 0x03, 0x98,
0x01, 0x4D, 0x67, 0x08, 0x66, 0x11]
return ''.join(( chr(t[0] ^ ord(t[1]))
for t in izip(repeat(chXORData), str) ))

Personally, I think this version isn't particularly easy on the eyes, but it does demonstate how to use some of the new functions in Python 2.4 (which isn't really that new anymore).


Anyway, that is all I had. Nothing particularly exciting, but I thought it would be a fun to poke fun at a terrible piece of logic and then use it to demonstrate different implementations in python. It's too bad my laptop wasn't up to the task.

No comments: