What's behind the C Compilation Process

Pierre Forcioli
4 min readSep 16, 2020

--

What's behind the C Compilation Process

The GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages.

But first, what is a compilation process?

The compilation process is the way a program converts the source code of the file you give to it (in our case, "main.c") into an understandable code for the computer. This process is divided in 4 different steps. And that’s perfect, gcc allows us to see these 4 steps distinctly.

According to the man gcc command, when you invoke GCC, it normally does preprocessing, compilation, assembly and linking.

Compilation Process Diagram

1st step: the Preprocessing

This step is made by the preprocessor (awesome isn't it?), it takes the source code (main.c) as input, and removes all the comments from it. Then, it includes all the files from libraries that our program needs. The last step of this compilation process step is the Macro Expansion. It will replace the values inside of our code.

Here is an example code that simply prints “Hello Holberton” and returns a value of 0. Let’s see what the preprocessor will change in this code. I ask gcc to do only the preprocessing on my .c file with gcc -E main.c . And here is what an expanded code looks like on my standard output.

First, you have the libraries you wrote in with #include <mylibrary.h> expanded, then, you can see that BETTY that I wrote in my #define BETTY Holberton is now replaced by his value, and finally, all my comments before my main function doesn't exists anymore.

2nd step: the Compilation

This step is made by the Compiler (you didn't expect that), this task converts the expanded code into assembly code. This step can be done by gcc -S main.c. This will execute the preprocessor, and the compiler only. You thought that the C was a too low-level programming language, what about this ?

So, as you can see, this code is still human-readable, but a little complicated isn't it? You can try to decode it, but good luck…

Note also that gcc -S main.c does not output on the standard output but it creates a file called the same as the source file, but with a .s instead of the .c.

3rd step: the Assembly

The assembler do this step (woaaaah). Here, the assembly code is converted into object code. I can't tell you much more about this step, you want to know why ? Let's have a look :

^ELF^B^A^A^@^@^@^@^@^@^@^@^@^A^@>^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@@^A^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@@^@^M^@^@UHM-^IM-eM >^@^@^@^@M-?^@^@^@^@M-8^@^@^@^@M-h^@^@^@^@M-

The code looks like that, and you understand that it's a little complicated to decode, much more than the assembly. Of course, this code can be obtained by gcc -c main.c and like the last steps, this command will just do the preprocessing, the compilation, the assembly, and then output it in a main.o file, on the same pattern as the previous step, keeping the name of the source.

4th step: the Linking

This task is accomplished by the… [drums rolling] …the linker (I promise, I will stop with this joke…). This step consists in linking together all of our code, the external required libraries, and the other objects file, if needed, to create an executable file, called by default a.out . And we achieve this with just one command : gcc main.c .

We can rename it with the gcc -o "myname" main.c command, but that's not the topic of today.

And finally…

We have our program working perfectly! You don't trust me? Just see:

Who' s the boss ? Now you know how to display a really fun text on your terminal, in C! Obviously, there is no big interest in doing this kind of programs, that was just to illustrate… On my next post, I will try to use more concrete things to illustrate my explanations. I WILL TRY!

--

--