Monday, July 27, 2009

Calculating MD5 of binaries without debug symbols

If you compile a binary with gcc with debugging information enabled (-g), the MD5 of the resulting binary will change depending on the name of the directory you compile it in. Which means that if two developers compile the same source code with the same options on the same machine, only they do it in their own home directories, the MD5 of the resulting binaries may differ.

However, as soon as you strip the binaries, their MD5s will be the same.

Which leads me to this little tool I whipped up to compare two binaries without stripping them:

#!/bin/sh
# Display the MD5 of a file, ignoring any debugging symbols in
# binaries.

# The strip(1)/objdump(1) commands for removing debugging
# symbols do not support writing to stdout so we need to
# allocate a temp file to write the stripped binary too.
tempfoo=`basename $0`
TMPFILE=`mktemp -q /tmp/${tempfoo}.XXXXXX`
if [ $? -ne 0 ]; then
echo "$0: Can't create temp file, exiting..."
exit 1
fi

while [ "$1" != "" ]; do
# The following line is a cheezy way to accurately
# reproduce the same error messages as md5(1) when a
# specified file is unreadable.
md5 "$1" > /dev/null
if [ $? -eq 0 ]; then

# Try to strip symbols from the file on the
# assumption it is a binary and, if successful,
# compute the md5 of the stripped file. Note that
# objcopy is the same as the strip(1) command. If
# objcopy failed to parse the file (i.e. because it
# is not in ELF format), simply compute the md5 of
# the whole file since there are no debugging symbols
# to strip.
m=`(objcopy -g "$1" $TMPFILE >/dev/null 2>&1 && \
md5 -q $TMPFILE;) || md5 -q "$1"`

# Output the result in a md5(1)-compatible format.
echo "MD5($1) = $m"
fi
shift
done

rm $TMPFILE

No comments: