Mac OS X uses dynamic libraries for most common functions, as do most other modern operating systems. There are some special features about the OS X dynamic library handling which can create extra challenges. One of these extra challenges was triggered yesterday by the MacPorts update of tk to 8.6.0, which led to finding out more about how OS X handles dynamic libraries.
The most immediate symptoms of the problem was that on updating the MacPorts port of "tk" to 8.6.0, MacPorts attempted to rebuild the "tk" port repeatedly before eventually giving up with a problem report:
Error: Port tk is still broken after rebuiling it more than 3 times.
Error: Please run port -d -y rev-upgrade and use the output to report a bug.
Port tk still broken after rebuilding 3 time(s)
followed by a stack trace. Then when I tried to run the one program I have which uses TCL/TK (CBB), I found that it wouldn't start due to problems with the TCL/TK interpreter, "wish":
dyld: Library not loaded: /opt/local/lib:/opt/local/lib/libtk8.6.dylib
Referenced from: /opt/local/bin/wish
Reason: image not found
Note the very odd pathname for the library, "/opt/local/lib:/opt/local/lib/libtk8.6.dylib" that it is attempting to load. Some hunting for these symptoms turned up MacPorts bug #37395, which was initially reported as a PowerPC/OS X 10.4 problem.
On Linux the first thing one does to diagnose dynamic library linking issues is use "ldd" to check the libraries being referenced and how they were resolved. OS X does not have a "ldd" tool, but some seaching turned up, "otool" provided as part of the "cctools" package (and Apple XCode), and "otool -L" can be used to list the libraries referenced by an executable (as "ldd" does on Linux):
ewen@bethel:~$ otool -L /opt/local/bin/wish8.6
/opt/local/bin/wish8.6:
/opt/local/lib/libfontconfig.1.dylib (compatibility version 8.0.0, current version 8.2.0)
/opt/local/lib:/opt/local/lib/libtk8.6.dylib (compatibility version 8.6.0, current version 8.6.0)
/opt/local/lib/libtcl8.6.dylib (compatibility version 8.6.0, current version 8.6.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.11)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation ([...])
/opt/local/lib/libXft.2.dylib (compatibility version 6.0.0, current version 6.1.0)
/opt/local/lib/libX11.6.dylib (compatibility version 10.0.0, current version 10.0.0)
/opt/local/lib/libXss.1.dylib (compatibility version 2.0.0, current version 2.0.0)
/opt/local/lib/libXext.6.dylib (compatibility version 11.0.0, current version 11.0.0)
/opt/local/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.7)
ewen@bethel:~$
from which it is obvious that the problematic pathname for the libtk8.6.dylib file is in the "wish8.6" executable image ("wish" is a symlink to that).
Some more searching turned up "install_name_tool" (also part of "cctools") which can be used to change the path name to a dynamic library that is embedded into an executable. For this case, running:
sudo install_name_tool -change \
"/opt/local/lib:/opt/local/lib/libtk8.6.dylib" \
"/opt/local/lib/libtk8.6.dylib" /opt/local/bin/wish8.6
to undo the strange pathname to the libtk8.6.dylib dynamic library resulted in wish working again, and being able to run CBB (which I reported to the MacPort #37395 ticket).
The remaining mystery was how that strange pathname ended up being in the executable, and what it apparently wasn't affecting later versions of Mac OS X. The latter turned out to be due to different flavours of the "tk" port being installed: the problem occurred on all OS X versions tested, but only if the "+x11" flavour was installed; the (default) "+quartz" flavour didn't have the problem. So that left the mystery of how the strange pathname came to be in the executable.
Dynamic libraries on OS X are handled a little differently to other unix platforms: each library has an "install_name" that tells tools the canonical pathname for the library. That canonical pathname (stored in the library) is then embedded into the executable when it is linked, so that it can reference directly by the loader. Typically these canonical pathnames are absolute pathnames on the filesystem, but they can also be relative to the executable or loader path. These "install_name"s are put into the library when it is first linked into the library:
/usr/bin/gcc-4.2 -dynamiclib [....] -install_name ${PATHNAME} [...]
along with things like the current and compatibility versions (used to ensure that the correct version of the dynamic library is being referenced).
"otool -D" can be used to inspect the install name that is embedded into a library, and for the problematic libtk8.6.dylib library the problem is clearly visible in the library file:
ewen@bethel:~$ otool -D /opt/local/lib/libtk8.6.dylib
/opt/local/lib/libtk8.6.dylib:
/opt/local/lib:/opt/local/lib/libtk8.6.dylib
ewen@bethel:~$
So clearly something was calculating the wrong pathname for the canonical path to the library. The next step was to get a build log from MacPorts, which proved somewhat troublesome (various things kept overwriting the build log). The answer turned out to be:
sudo port clean tk
sudo port -skn upgrade --force --no-rev-upgrade tk -quartz +x11
and an activated version of the "tk" port already being installed; anything else either resulted in not building from source, or resulted in a second pass which overwrote the build log being kept ("-k") with just an install log. ("--no-rev-upgrade" disables the attempt to check if the executables all still work, which was what was triggering the rebuilds, since the "wish8.6" executable didn't work any longer; "-s" insists on a build from source, and "--force" forces a rebuild even though the current version is already installed.)
Once obtained, the build log showed a single reference to the problematic path when it was being stamped into libtk8.6.dylib. Which meant that the Makefile was the next obvious cause.
Tracing backwards the problem turned out to be that the "unix" configure script (used for the x11 flavour, but not the default quartz flavour) had assembled a multi-part "LIB_RUNTIME_DIR" search path in the Makefile, which was then directly copied into "DYLIB_INSTALL_DIR" variable, and then used as the canonical pathname for the "install_name". Presumably this approach worked when "LIB_RUNTIME_DIR" only happened to find one library directory; but changes in how the x11 libraries were detected resulted in it finding multiple directories. Underlying this is an apparent confusion over whether "LIB_RUNTIME_DIR" is a single directory or a search path; on most platforms executables can have a library search path included (eg, "RPATH"), so for many platforms the confusion probably wasn't noticed.
In the end the MacPorts work around was very simple, and just set "DYLIB_INSTALL_DIR" from a Makefile variable that only contained a single directory pathname ("${libdir}"). The fix was committed in less than a day after the problem was discovered, as a one line patch.
For completeness, I've reported it to the "tk" bug tracker, as tk bug 3598664, referencing the MacPorts ticket. So the build issue may even get fixed upstream.