By yours truly
int main() { return 57; }
Let's compile this and execute it
$ g++ -o hello hello.cpp $ ./hello $ echo $? # shows return value of last command 57
$ g++ -o hello hello.cpp $ ./hello $ echo $? # shows return value of last command 57
How did our code in "English" turn into 0s and 1s?
What happened in between?
What does it mean to compile?
In simple terms, a Compiler is a translater, that takes code written in a high-level language (return blablabla) and converts it into a low-level language that a computer can understand (01010101...)
This is a one-time process, and the output is a file that can be executed multiple times
But this isn't the only translator in the market
Enter the Interpreter
Instead of translating the entire code at once, an Interpreter translates the code line-by-line, executing it as it goes
with some trade-offs...
Compiler | Interpreter |
---|---|
Compiles the whole code before execution | Executes code line-by-line |
Faster execution | Slower execution |
Displays all errors after compilation | Stops at the first error |
No source code needed after compilation | Requires source code every time |
Used in production (C, C++, C#) | Used in development (Python, Ruby, Perl) |
Compiler converts the code into assembly language, which is then converted into machine code by the assembler
What does the assembly code for our C++ program look like?
We can run the following command to get the assembly code
g++ -S -o hello.s hello.cpp
int main() { return 57; }
.file "hello.cpp" .text .globl main .type main, @function main: .LFB0: .cfi_startproc endbr64 ; and so on...
(Go Down ↓)
Assembly language is a low-level programming language that is a human-readable version of machine code
It uses mnemonics to represent machine instructions (like MOV, ADD, SUB)
It requires manual handling of CPU registers and memory addresses
x86-64 for Intel
ARM for Apple M-series
g++ will give a.out in Linux, a.exe in Windows
Each OS has different syscalls(?)
Headers change, from <windows.h> to <unistd.h>
and so do file system conventions, from "C:\\Users\\" to "/home/user/"
Data can be stored in memory and can be accessed using memory addresses
Reading and storing data in this way is slow
A much faster alternative are registers which are internal storage locations of the processor
Registers are the processor's personal notepad for storing information temporarily, though few in number
Examples - RAX, ESP, DS
System Calls are APIs for the interface between the user space and the kernel space
Every system call has a unique number associated with it
This number is stored in the EAX or RAX with arguments passed to other general purpose registers and stack as needed
The syscall instruction is used to trigger a system call
Let's take a look at the (simplified) assembly equivalent of our C++ snippet
global _start _start: mov rax, 60 mov rdi, 57 syscall
This tells the assembler to expose _start as the global label so that the linker can recognise it as the entry point of the program
global _start _start: mov rax, 60 mov rdi, 57 syscall
This defines the _start label, which is the entry point of our program
global _start _start: mov rax, 60 mov rdi, 57 syscall
Moves the value 60 into the register rax
In Linux x86-64, 60 is the syscall number for exit
global _start _start: mov rax, 60 mov rdi, 57 syscall
Moves the value 57 into the register rdi
The exit syscall uses rdi as its first argument, which represent the exit status of the program
global _start _start: mov rax, 60 mov rdi, 57 syscall
Triggers a system call. Since rax contains 60, this executes exit(57) syscall, terminating the program with exit code 57
global _start _start: mov rax, 60 mov rdi, 57 syscall
Let's compile and run this assembly code
The assembler, nasm, translated the assembly code into machine code and generates and object file
nasm -f elf64 -o hello.o hello.s
The linker, ld takes the object file and links it with the necessary library to create an executable file
ld -o hello hello.o
During Lexing, entries for identifiers are added to the symbol table
During Parsing, information about the type of the identifier is added
During Semantic analysis, the symbol table is used to check for undeclared variables, type mismatches, etc.
During Optimization, the symbol table is used to track the usage of variables and functions
During Code Generation, the symbol table is used to generate assembly code using the addresses of the variables and functions
Types of Tokens in a Compiler
if
, else
, while
)variableName
, functionName
)+
, -
, *
, /
)42
, "Hello"
, true
);
, ,
, {}
, ()
)
int main()
{
int x, a=5, b=7;
x = 4*a + b;
printf("x is %d", x);
}
Keywords: int
, int
Identifiers: main
, x
, a
, b
, x
, a
, b
, printf
, x
Operators: =
, =
, =
, *
, +
Literals: 5
, 7
, 4
, "x is %d"
Punctuation: (
, )
, {
, ,
, ,
, ;
, ;
, (
, ,
, )
, ;
, }
Parsing is the process of converting the tokens into a tree-like structure called the Abstract Syntax Tree (AST)
The AST is a representation of the code that is easier to work with
It is used to check the correctness of the code and generate the intermediate representation
The grammar defines the syntax of the language
The grammar consists of Terminals and Non-terminals, Production Rules, and Start Symbol
return(57);
return(57);
stmt
│
returnStmt
┌────┴────┐
RETURN expr
│ │
LPAREN NUMBER
│ │
( 57
│
RPAREN
│
SEMI
ReturnStmt
│
57
x = a + b * 7;
x = a + b * 7;
stmt
│
assignmentStmt
┌────┴────┐
ID expr
│ │
x ┌─┴─┐
expr expr
│ │
ID ┌──┴──┐
│ expr NUMBER
a │ │
ID 7
│
b
Assign(x)
│
+
/ \
a *
/ \
b 7
It does type checking, scope checking, name resolution, control flow analysis and type inference
It uses the AST and the symbol table to perform these checks
It allows for machine-independent optimizations
x = (a + b) * c;
t1 = a + b
t2 = t1 * c
x = t2
Each name has a pointer to an entry in the symbol table
TAC greatly simplifies nested loops and if-while statements
While a < b
If c < d Then
x = (y + z) * w
Else
x = (y - z) * w
End If
End While
L1: IfFalse a < b goto L3
IfFalse c < d goto L2
t1 = y + z
t2 = w * t1
x = t2
goto L1
L2: t2 = y - z
x = t2
t1 = y - z
t2 = w * t1
x = t2
goto L1
L3: ...
A basic block is a sequence of instructions with a single entry point and a single exit point and they are always executed together
This allows for better optimization
L1: IfFalse a < b goto L3
IfFalse c < d goto L2
t1 = y + z
t2 = w * t1
x = t2
goto L1
L2: t2 = y - z
x = t2
t1 = y - z
t2 = w * t1
x = t2
goto L1
L3: ...
B1 IfFalse a < b goto B7
B2 IfFalse c < d goto B5
B3 t1 = y + z
t2 = w * t1
x = t2
B4 goto B1
B5 t2 = y - z
x = t2
B6 t1 = y - z
t2 = w * t1
x = t2
B6 goto B1
B7 ...
Note that flow between different blocks is stored in a graph called the Control Flow Graph
Each node is a basic block and each edge is a possible flow of control
Instructions inside each node are stored in a linked list, order can thus be changed with just manipulation of pointers when we optimise this IR
This involves removing redundant instructions, dead code, and constant folding
The observable behaviour of the program should not change after these optimizations
... and it needs to be quick
This is done by using a directed acyclic graph (DAG) to store the results of the calculations
t1 = a + b t2 = a + b # becomes t1 = a + b t2 = t1
t1 = a + b t2 = a + b # becomes t1 = a + b t2 = t1
If it does, then that part of the tree can be reused and we can copy the value of the pre-existing node
a = b + c
b = a - d
d = a - d
#becomes
a = b + c
b = a - d
d = b
d b
\ /
( - )
/ \
a d
\ /
( + )
/ \
b c
p = 5 p = 5 q = p + 3 q = 8 r = q + 2 r = 10 p = q * r p = 80
p = 5 p = 5 q = p + 3 q = 8 r = q + 2 r = 10 p = q * r p = 80
p = 5 p = 5 q = p + 3 q = 8 r = q + 2 r = 10 p = q * r p = 80
p = 5 p = 5 q = p + 3 q = 8 r = q + 2 r = 10 p = q * r p = 80
p = 5 p = 5 q = p + 3 q = 8 r = q + 2 r = 10 p = q * r p = 80
This is done by checking if the value of a variable is used after it is assigned
int foo(void) { int a = 24; int b = 25; /* Assignment to dead variable */ int c; c = a * 4; return c; b = 24; /* Unreachable code */ return 0; }
int foo(void) { int a = 24; int b = 25; /* Assignment to dead variable */ int c; c = a * 4; return c; b = 24; /* Unreachable code */ return 0; }
Removing dead code can make the program smaller and faster
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
a = b + 0 a = b b = c * 1 b = c c = d - 0 c = d t = x**2 t = x * x t = x * 2 t = x + x t = x * 8 t = x << 3
This is the process of conversion of IR to assembly code, it involves :
Basic arithmetic and bitwise operations are usually direct (provided the registers are loaded properly)
For example, the IR instruction t1 = a + b
can be mapped to ADD a, b, t1
in x86-64
Complex operations like division are split into multiple instructions (as we will soon see...)
Registers are faster than memory, so the goal is to keep as many variables in registers as possible
If there are more variables than registers, some variables will be stored in memory, this is called spilling
Go down to see some examples
This is done to take advantage of pipelining and out-of-order execution
mov eax, [mem] ; Load from memory add eax, 1 ; Add 1 sub ebx, 6 ; Subtract 6 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory add eax, 1 ; Add 1 sub ebx, 6 ; Subtract 6 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory add eax, 1 ; Add 1 sub ebx, 6 ; Subtract 6 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory add eax, 1 ; Add 1 sub ebx, 6 ; Subtract 6 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory add eax, 1 ; Add 1 sub ebx, 6 ; Subtract 6 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory add eax, 1 ; Add 1 sub ebx, 6 ; Subtract 6 mov [mem], eax ; Store to memory
The above code can be reordered to take advantage of pipelining
mov eax, [mem] ; Load from memory sub ebx, 6 ; Subtract 6, perform this while eax loads add eax, 1 ; Add 1 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory sub ebx, 6 ; Subtract 6, perform this while eax loads add eax, 1 ; Add 1 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory sub ebx, 6 ; Subtract 6, perform this while eax loads add eax, 1 ; Add 1 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory sub ebx, 6 ; Subtract 6, perform this while eax loads add eax, 1 ; Add 1 mov [mem], eax ; Store to memory
mov eax, [mem] ; Load from memory sub ebx, 6 ; Subtract 6, perform this while eax loads add eax, 1 ; Add 1 mov [mem], eax ; Store to memory
An independent instruction can be executed while the command executes
mov eax, [arr1] ; Load from memory mov ebx, [arr2] ; Load from memory add eax, ebx ; Add mov [arr3], eax ; Store to memory
The operations are repeated for each element in the array
The above code can be vectorised to take advantage of SIMD(Single Instruction, Multiple Data) instructions
movdqu xmm0, [arr1] ; Load 4 packed integers from arr1 movdqu xmm1, [arr2] ; Load 4 packed integers from arr2 paddd xmm0, xmm1 ; Add packed 32-bit integers movdqu [arr3], xmm0 ; Store result to arr3
This is done to reduce the number of branches and improve instruction cache usage
The loop body is duplicated and the loop counter is incremented by the number of unrolled iterations
Inlining is the process of replacing a function call with the function body
This is done to reduce the overhead of the function call
Division is a very computationally expensive operation
So, GCC does black magic (math) to convert division into multiplication and bit shifts
A simple function which divides by 2006 looks something like this
divideby2006(unsigned int): mov eax, edi imul rax, rax, 1096222959 shr rax, 41 ret
n/2006 = ( n * 1096222959 ) >> 41 (How?)