Source code to machine instruction
Tracing the ubiquitous “Hello World!” as far as we can
When you compile the following code,
[code language=”c”]
int main() { printf (“Hello World!”); return 0; }
[/code]
Then the compiler processes the text above into machine code. For now, simply take that the compiler is a program that takes the source file we see above and spits them out in another form that the specific computer architecture that you’re running on can operate. The output being an executable. It is a sequence of ones and zeros, where different computer architecture have agreed to mean specific instructions for the CPU. Thus an excerpt of the above “Hello World!” executable would look like
[code] 0000620 ffff e8ff ff48 ffff 05c6 09e1 0020 5d01 0000630 0fc3 801f 0000 0000 c3f3 0f66 441f 0000 0000640 4855 e589 e95d ff66 ffff 4855 e589 8d48 0000650 9f3d 0000 b800 0000 0000 c1e8 fffe b8ff 0000660 0000 0000 c35d 2e66 1f0f 0084 0000 0000 0000670 5741 5641 8949 41d7 4155 4c54 258d 0736 [/code]
The first column are offsets, and the remaining columns are hexadecimal numbers. For instance 0x55 when seen by the CPU might mean “push rbp”. The excerpt above are carefully selected from the executable hexdump because they actually form the instructions for the CPU to print the string “Hello World!” to the standard output!
In radare2’s disassembly of the same executable, we see that
[code]
0x0000064a 55 push rbp 0x0000064b 4889e5 mov rbp, rsp 0x0000064e 488d3d9f0000. lea rdi, str.Hello_World ; 0x6f4 ; “Hello World!” 0x00000655 b800000000 mov eax, 0 0x0000065a e8c1feffff call sym.imp.printf ; int printf(const char *format) 0x0000065f b800000000 mov eax, 0 0x00000664 5d pop rbp 0x00000665 c3 ret
[/code]
Note that the middle column that starts with 55 and end with c3 are also present in the hexdump lines 0000640 and 0000660. The order in the bytes are flipped due to little endian formatting.
The symbols on the right such as “push rbp” etc are symbolic representations of the binary values. They are called assembly language and serves to provide the human reader an understanding of the instructions.
Next, we look at how the CPU is capable of performing tasks such as arithmetic, or read/write to memory when presented with instructions such as “55 48 89 e5 48 83 … 5d c3”.
Any computer algorithm can be performed by a Turing Complete machine. Almost all modern deterministic computers adheres to the Von Neumann architecture, which in a Turing Complete design.
A typical CPU of the Von Neumann architecture contains the components in the diagram below,
Image: Von Neumann architecture
When the CPU is presented with machine instructions such as “55 48 89 … 5d c3”, it triggers various patterns of electrical signals. Similar to assembly, the electrical signal patterns are based on agreed upon convention.
In the diagram above, there is a control unit within the CU that contains a program counter that allows the instructions to be operated on sequentially. Each instruction is sent to the arithmetic logic unit (ALU) which then returns certain electrical signals as output. The ALU is made from combinations of logic gates and transistor which enables the deterministic response pattern of electrical signals. The electrical signals are physical representations of binary values.