Thursday, December 20, 2007

Less code

I was just reading Steve Yegge's rant against code size and realized that he managed to put into words exactly the feelings that have been drawing me to python in recent years. In particular, I managed to mostly skip the Java step in my journey from Pascal, through assembler, up to C, and then the leap to high-level languages including perl, and more recently python. I don't really know why, but Java never felt "right" -- for anything. To this day, I can't think of too many applications that I would say Java was the best tool for the job. For which, I think Steve hit the nail on the head when he writes:
Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.

Hallelujah, brother.

Anyway, I strongly agree with Steve's general points about the merits of small code bases, but I won't go so far to say that smaller is necessarily always better. Python hits a sweet spot for me (at least for now) between compactness and comprehensiveness. Certainly a good number of problems could be expressed more succinctly in a functional language such as Erlang or Haskell, but you lose readability. In fact, as elegantly as many problems can be expressed in a functional language, they quickly start to look like line noise when the problems exceed textbook examples.

Programming language preferences aside, what I agree with most from Steve's blog post was not so much that more succinct languages are better, but that less code is better. His post is written so as to suggest that Java itself is a problem -- which may certainly be true -- but he doesn't clarify whether he thinks it is Java the language, or Java the set of libraries.

Python, for example, combines a great set of standard libraries with a language syntax that makes it easy to use those libraries. All the lines of code hidden away in libraries are effectively "free" code. You don't have to manage their complexity. Give me a language that makes leveraging as many libraries as possible painless, then I can glue them together to make great programs with low apparent complexity. In reality, the lines of code might be astronomical, but I don't have to manage the complexity of all of it -- just the part I wrote -- so it doesn't matter.
Python does a great job here, whereas Java (and C++'s STL) largely get it wrong.

In particular, I would argue that, in addition to python's straightforward syntax, the fact that so many of python's libraries are written in C is a large factor in why they are so easy to use. There may be a huge amount of complexity, and a huge number of lines of code, in the C implementation of a library. However, the API boundary between python and C acts a sort of line of demarcation -- no complexity inherent in the implementation of the library can leak out into the python API without the programmer explicitly allowing it. That is, the complexity of libraries written in C and callable from python is necessarily encapsulated.

As a personal anecdote, in one project I work on, we use ctypes to make foreign function calls to a number of Windows APIs. One thing that really bothers me about this technique is that I find myself re-implementing a number of data structures in ctypes that are already defined in C header files. If I make a mistake, then I introduce a bug. Ironically, since I could leverage more existing code, often times there would be fewer lines of code and less complexity had I just used C to call the APIs. Of course, other parts of the program would become hugely unwieldy, but the point of this anecdote is that libraries (more specifically, being able to leverage known-good code) can be much more effective in reducing code than the implementation language.

So long as the implementation language isn't Java. Java just sucks. :)

Tuesday, December 11, 2007

Lies, Damn Lies, and ... Economics?

Today I'm going to venture out of any field that I have the slightest expertise in and flounder about in the field of basic macro-economics.

But before I demonstrate my utter lack of knowledge, I'm going to touch on a subject that I at least have some familiarity with -- mathematics. I'm going to share with you a math problem that I cannot solve. It looks something like this:


A A'
--- = 111 = ----
B B'


B = B' * 1.41

What is the ratio between A and A'?
Elementary algebra would seem to imply the answer should be 1.41:

A A'
--- = ----
B B'

A * B' = A' * B

A * B' = A' * B' * 1.41 ==> A = A' * 1.41

By itself, I would have expected this problem would be trivial.
However, what perplexes me is that the observed value for the ratio between A and A' is nowhere near 1.41, but rather approximately 1.0.

And here is where I demonstrate my lack of understanding in the field of economics...

The thing that has been puzzling me for years (even before the recent foreign exchange craze) is that the U.S. dollar has experienced fairly consistent inflation for these 13 years while the Japanese Yen has experienced almost none, yet the exchange rate remains the same.

You see, A' / B' is the average value of the Japanese Yen in U.S. dollars for November 2007. And A / B is the average value of the Japanese Yen in U.S. dollars for April 1994. Both ratios just happen to be approximately 111. (Source: Board of Governors of the Federal Reserve System)

The ratio B' / B is the purchasing power of the U.S. dollar in 2007 relative to dollars in 1994. That is, one 1994 dollar (B) is worth 1.41 2007 dollars (B'). (Source: Bureau of Labor Statistics)

The ratio A' / A is the purchasing power of the Japanese Yen in 2007 relative to yen in 1994. It so happens that one Japanese yen buys the same amount in 2007 that it did in 1994. (Source: Japanese Ministry of Internal Affairs and Communications)

How can this be?
How can two currencies can have different inflation rates, but yet maintain the same conversion rate?

I'm no economist, but I cannot help but wonder if the answer lies outside of math and in the realm of human irrationality. Or that the CPI values uses to calculate the purchasing power of a currency are inconsistent and/or flawed. Actually, I know that CPI calculation methods differ amongst countries, but for some reason I expected Japan to have used to the calculation method that U.S. does. I admit I haven't investigated that explanation yet.

Does anyone have a better explanation (preferably one that does not violate basic math principles)? I would love to put this puzzle to rest.

Monday, December 3, 2007

Python: asserting code runs in specific thread

My buddy JJ's post to the BayPiggies mailing list reminded me of a little snippet I wrote a while back that others might find useful as well. Personally, I avoid threads like the plague, but if you are forced to use them it is generally handy to keep accurate tabs on how you use them. In particular, as JJ suggested in his post, it is a good idea to assert that code is called in the thread context you expect it to be called in. This can go a long way toward avoiding one of many classes of hard-to-find logic bugs multi-threading invites. Anyway, on to the code...
def assertThread(*threads):
"""Decorator that asserts the wrapped function is only
run in a given thread
# If no thread list was supplied, assume the wrapped
# function should be run in the current thread.
if not threads:
threads = (threading.currentThread(), )

def decorator(func):
def wrapper(*args, **kw):
currentThread = threading.currentThread()
name = currentThread.getName()
assert currentThread in threads or name in threads, \
"%s erroniously called in %s thread " \
context" % (func.__name__, name)
return func(*args, **kw)

if __debug__:
return wrapper
return func
return decorator

You can restrict execution to one or more threads, each specified by either the thread object or thread name.

Note the trick at the end to make the decorator effectively a no-op in production. Using this decorator around your functions and methods helps you spot logic errors during development without impacting the performance of your production code. Of course, if you are of the school that assertions should never be disabled, feel free to replace the final if __debug__:/else block with an unconditional return of wrapper.