Reference: Dynamic Linking
A reference to the esoteric details needed for building dynamically linked software at the native layer
Operating Systems: Windows, Linux
Compilers: CL (Microsoft C/C++ Optimizing Compiler), GCC, CLANG
Arcane Arts | in depth examples of dynamic linking |
Incantations | compiler keywords & command line flags |
Lexicon | term definitions & usage clarifications |
Arcane Arts
If you are already familiar with the principle ideas of dynamic linking, then my dynamic linking example repository can help fill in all the details for putting dynamic linking into practice.
It contains an assortment of examples that cover the depths of the topic. It has concrete examples of dynamic linking on Windows and Linux, with CL, GCC, and CLANG, additional concrete examples for solving the "initialization problem", and an example abstraction that creates a portable, dynamically linkable, base layer.
At the root of the repository you will find a GUIDE.txt
that will help you get your bearings in the repository. While you are learning, take the time to read the code and the DETAILS.txt
in each example. The code and build scripts are designed to be as clear and minimal as possible, so you can run and modify them to see them working. On the other hand, the DETAILS.txt
are written to cover the examples ideas thoroughly.
Once you've understood these concepts and tools, the rest of the reference is organized so that you can use it to quickly refresh your memory with the keywords, command invokations, and terminology.
Incantations
Incantations with CLANG
CLANG's incantations are often based on the incantations in CL and GCC. Anywhere in this section where clang is not mentioned, you can assume that it supports the CL incantations when building on Windows and the GCC incantations when building on Linux.
Mark a function as an exported symbol
CL: __declspec(dllexport)
GCC: __attribute__ ((visibility ("default")))
Mark a function as a load-time imported symbol
CL: __declspec(dllimport)
GCC: no mark and has no definition
Mark a function as a local symbol
CL: no mark
GCC: __attribute__ ((visibility ("hidden")))
GCC symbol visibility defaults
By default in the GCC Linker (ld
), all symbols that are not marked as local symbols and have a definition are exported symbols. However, this default can be changed to make the language for communicating with the linker more similar in structure to the language for communicating with the CL linker.
GCC: -fvisibility=hidden
Generating shareable binaries (.dll/.so
)
CL: /LD
LINK: /DLL
cl
to handle both compiling and linking, then the /DLL
option is passed automatically for you when you use /LD
. You only need the separate linker option /DLL
if you invoke it yourself.
GCC: -fPIC -shared
-fPIC
is for compilation and the flag -shared
is for linking. You can use both in one line if you are using a single command for both compilation and linking.
CLANG: on windows -shared
for linking
Generating binaries with load-time imported symbols
Windows: Whenever a binary file is generated, if it has exported symbols, then the linker should also generate a .lib
file. Passing this .lib
to the linker provides the information necessary to prepare the load-time imported symbols.
cl [my_program].c [load_time_dependency].lib
link [my_object1].obj [load_time_dependency].lib
Linux: The binary files themselves can be passed to the linker.
gcc [my_program].c [load_time_dependency].so
ld [my_object1].o [load_time_dependency].so
BUT the default search rules used by Linux's loader will usually fail to find the external binary file even if you link it correctly. (see custom loader paths)
Custom loader paths
GCC: -Wl,-rpath,PATH
NOTE: For the specific case of adding a binary relative path, set PATH
to \$ORIGIN/
Includes for run-time loading & linking
Windows: #include <Windows.h>
Linux: #include <clfcn.h>
API for run-time loading
IMPORTANT: these APIs do not have very similar rules for how they use the input file name to find a matching binary file. I recommend using full paths or binary relative paths.
Windows: LoadLibraryA
or LoadLibraryW
Linux: dlopen
with RTLD_NOW
for the flag. To achieve a binary relative path start the path with $ORIGIN/
API for run-time linking
Windows: GetProcAddress
Linux: dlsym
Execution on load before main
MSVC CRT:
#pragma section(".CRT$XCU", read) per compilation unit
__declspec(allocate(".CRT$XCU"))
__pragma(comment(linker,"/include:unique-name"))
void(*unique-name)(void) = your-function;
GCC:
__attribute__((constructor)) your-function-definition
Lexicon
The terminology in this topic is not exactly standardized. I think the way I use terms makes sense and lines up pretty well with common usages I've seen. I use these terms throughout the reference so I include them here for clarity.
I use dynamic linking to mean any architecture in which a process works by loading more than one binary. Put concretely, linking to 'dynamic link libraries' and 'shared objects' (.dll files and .so files).
I use binary file to mean a file that contains executable data. These generally come in two 'flavors'. An executable is a binary file which a user can execute directly, and starts a new process. The other 'flavor' is a little harder to name. On Windows it is the .dll
or dynamic link library. On Linux it is the .so
or shared object. When I want to be generic I'll just call it a shareable binary.
I use process to mean the operating system construct that organizes the ongoing activity of a single program. A process can be pieced together out of multiple binary files. Usually one executable and zero or more shareable binaries. However, we could load multiple executables if we wanted to. The only real rule is that to launch a process the initial binary file has to be an executable and this limitation is just a feature of the command line interface and the file explorer interface.
I use linker to mean a program which generates binary files.
I use loader to mean the part of the operating system which reads data out of binary files and places that data into a process. On Windows this part of the operating system is totally opaque to us. On Linux, for better or worse, this part of the operating system is extremely flexible, and can be redefined in a number of ways.
I use linking to mean correlating usage sites with definition sites. In this context, usage sites are usually function calls, and definition sites are function bodies. We can also generate a usage site in this context by using a function name as a function pointer. This is not only performed in the linker. Linking happens at many stages of the translation from source code to a binary file, it also happens in the loader, and even in our own code as we build our software.
I use link-time linking to mean the linking that occurs inside of a linker. Linking that occurs here is baked into the binary file that the linker produces.
I use load-time linking to mean the linking that occurs inside of a loader when a process is first being initialized. This occurs when the initial executable references other binaries as load-time dependencies. Each binary file can include a list of it's own unresolved symbols, that need to be resolved by load-time linking. If everything works as it's supposed to, the list of load-time dependencies in each binary file will ensure that every all the symbols in this unresolved list get resolved. If there are any load-time dependencies the operating system can't locate, it produces an error message and kills the process.
I use run-time loading to mean the loading of binary files inside our own code. This goes through APIs like LoadLibraryA
and dlopen
. When a binary file is loaded at run-time the operating system still tries to resolve its load-time dependencies. If the operating fails to find the binary file or it fails to load for any reason, the process continues, our process gets the opporunity to handle the failure to load the binary in our own way.
I use run-time linking to mean the linking that occurs inside our own code, after run-time loading is successful. This goes through APIs like GetProcAddress
and dlsym
.
I use load-time imported symbol to mean a symbol that is marked as unresolved inside a binary file. These exist in a list in the binary file. All load-time imported symbols must be linked by the loader itself, or the load process fails.
I use exported symbol to mean a symbol that is defined within a binary file and marked as visible to the outside world by the binary file. These exist in a list in the binary file. An exported symbol can be used by the loader during load-time linking or it can be used by our own code during run-time linking.
I use local symbol to mean a symbol that is defined within a binary file and that is not visible to the outside world. These symbols may be available in link-time linking but they are not preserved in any data structure in the final binary file. The artifacts of these symbols which remain are the machine instructions and other data generated by the compiler and linker, but no trace of the symbol name remains, so they cannot be located and used by later linking processes.