Thursday, December 20, 2007

Less code

I was just reading Steve Yegge's rant against code size and realized that he managed to put into words exactly the feelings that have been drawing me to python in recent years. In particular, I managed to mostly skip the Java step in my journey from Pascal, through assembler, up to C, and then the leap to high-level languages including perl, and more recently python. I don't really know why, but Java never felt "right" -- for anything. To this day, I can't think of too many applications that I would say Java was the best tool for the job. For which, I think Steve hit the nail on the head when he writes:
Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.

Hallelujah, brother.

Anyway, I strongly agree with Steve's general points about the merits of small code bases, but I won't go so far to say that smaller is necessarily always better. Python hits a sweet spot for me (at least for now) between compactness and comprehensiveness. Certainly a good number of problems could be expressed more succinctly in a functional language such as Erlang or Haskell, but you lose readability. In fact, as elegantly as many problems can be expressed in a functional language, they quickly start to look like line noise when the problems exceed textbook examples.

Programming language preferences aside, what I agree with most from Steve's blog post was not so much that more succinct languages are better, but that less code is better. His post is written so as to suggest that Java itself is a problem -- which may certainly be true -- but he doesn't clarify whether he thinks it is Java the language, or Java the set of libraries.

Python, for example, combines a great set of standard libraries with a language syntax that makes it easy to use those libraries. All the lines of code hidden away in libraries are effectively "free" code. You don't have to manage their complexity. Give me a language that makes leveraging as many libraries as possible painless, then I can glue them together to make great programs with low apparent complexity. In reality, the lines of code might be astronomical, but I don't have to manage the complexity of all of it -- just the part I wrote -- so it doesn't matter.
Python does a great job here, whereas Java (and C++'s STL) largely get it wrong.

In particular, I would argue that, in addition to python's straightforward syntax, the fact that so many of python's libraries are written in C is a large factor in why they are so easy to use. There may be a huge amount of complexity, and a huge number of lines of code, in the C implementation of a library. However, the API boundary between python and C acts a sort of line of demarcation -- no complexity inherent in the implementation of the library can leak out into the python API without the programmer explicitly allowing it. That is, the complexity of libraries written in C and callable from python is necessarily encapsulated.

As a personal anecdote, in one project I work on, we use ctypes to make foreign function calls to a number of Windows APIs. One thing that really bothers me about this technique is that I find myself re-implementing a number of data structures in ctypes that are already defined in C header files. If I make a mistake, then I introduce a bug. Ironically, since I could leverage more existing code, often times there would be fewer lines of code and less complexity had I just used C to call the APIs. Of course, other parts of the program would become hugely unwieldy, but the point of this anecdote is that libraries (more specifically, being able to leverage known-good code) can be much more effective in reducing code than the implementation language.

So long as the implementation language isn't Java. Java just sucks. :)

3 comments:

Thomas Guest said...

Kelly, have you tried the ctypes code generator? You'll need to install some dependencies, but once you've done that it generates the Python/C interface directly from the C header files. It's worked well for me.

Here's the documentation

Kelly Yancey said...

Thanks for the reminder! When I first starting using ctypes, the code generator was really buggy and then I completely forgot about it.

Some of Microsoft's headers are absolutely huge, though, and I only need a subset. It looks like I can filter out just what I need from the XML before outputting to python, but then that is more code to write.... :)

Kelly Yancey said...

Nevermind. It looks I can specify all the symbols I need through a massive regex on the ctypes code generator's xml2py.py command-line. No code needed; I'll have to give this a try (for the next set of python wrappers, that is). Thanks again!