Intro to CPU Registers

HTB Academy2024-07-302025-08-07

Discalimer ⚠️

The resources for this article are from Stack-Based Buffer Overflows on Linux x86, and this article is intended only for personal review. It is advisable to consult the original resource for more detailed information.

CPU Registers

Registers are the essential parts of CPU. Almost every register have a small amout of storage space to store data temporarily. These registers can be classified as General registers, Control registers, Segment registers. The one we care the most is General registers. In these, they can be subdevided into Data registers, Pointer registers, and Index registers.

Data registers

32-bit Register	64-bit Register	Description
`EAX`	`RAX`	Accumulator is used in Input/Output and for arithmetic operations
`EBX`	`RBX`	Base is used in indexed addressing
`ECX`	`RCX`	Counter is used to rotate instructions & count loops
`EDX`	`RDX`	Data is used for Input/Output and in arithmetic operations for multiply and divide operations involving large values

Pointer registers

32-bit Register	64-bit Register	Description
`EIP`	`RIP`	Instruction Pointer (IP) stores the offset address of the next instruction to be executed
`ESP`	`RSP`	Stack Pointer (SP) points to the top of the stack
`EBP`	`RBP`	Base Pointer (BP) points to the base of the stack

Index registers

32-bit Register	64-bit Register	Description
`ESI`	`RSI`	Source Index is used as a pointer from a source for string operations
`EDI`	`RDI`	Destination is used as a pointer to a destination for string operations

Stack Frames

Since the stack starts with a high address and grows down to low memory addresses as values are added, the Base Pointer points to the beginning (base) of the stack, while the Stack Pointer points to the top of the stack.

As the stack grows, it is logically divided into regions called Stack Frames, which allocate the required memory in the stack for the corresponding function. A stack frame defines a frame of data with the beginning (EBP) and the end (ESP) that is pushed onto the stack when a function is called.

Since the stack memory is built on a Last-In-First-Out (LIFO) data structure, the first step is to store the previous EBP position on the stack, which can be restored after the function completes. If we now look at the bowfunc function, it looks like following in GDB:

(gdb) disas bowfunc 

Dump of assembler code for function bowfunc:
   0x0000054d <+0>:	    push   ebp       # <---- 1. Stores previous EBP
   0x0000054e <+1>:	    mov    ebp,esp
   0x00000550 <+3>:	    push   ebx
   0x00000551 <+4>:	    sub    esp,0x404
   <...SNIP...>
   0x00000580 <+51>:	leave  
   0x00000581 <+52>:	ret

The EBP in the stack frame is set first when a function is called and contains the EBP of the previous stack frame. Next, the value of the ESP is copied to the EBP, creating a new stack frame.

(gdb) disas bowfunc 

Dump of assembler code for function bowfunc:
   0x0000054d <+0>:	    push   ebp       # <---- 1. Stores previous EBP
   0x0000054e <+1>:	    mov    ebp,esp   # <---- 2. Creates new Stack Frame
   0x00000550 <+3>:	    push   ebx
   0x00000551 <+4>:	    sub    esp,0x404 
   <...SNIP...>
   0x00000580 <+51>:	leave  
   0x00000581 <+52>:	ret

Then some space is created in the stack, moving the ESP to the top for the operations and variables needed and processed.

Function Prologue

(gdb) disas bowfunc 

Dump of assembler code for function bowfunc:
   0x0000054d <+0>:	    push   ebp       # <---- 1. Stores previous EBP
   0x0000054e <+1>:	    mov    ebp,esp   # <---- 2. Creates new Stack Frame
   0x00000550 <+3>:	    push   ebx
   0x00000551 <+4>:	    sub    esp,0x404 # <---- 3. Moves ESP to the top
   <...SNIP...>
   0x00000580 <+51>:	leave  
   0x00000581 <+52>:	ret

These three instructions represent the so-called Prologue.

For getting out of the stack frame, the opposite is done, the Epilogue. During the epilogue, the ESP is replaced by the current EBP, and its value is reset to the value it had before in the prologue. The epilogue is relatively short, and apart from other possibilities to perform it, in our example, it is performed with two instructions:

Function Epilogue

(gdb) disas bowfunc 

Dump of assembler code for function bowfunc:
   0x0000054d <+0>:	    push   ebp       
   0x0000054e <+1>:	    mov    ebp,esp   
   0x00000550 <+3>:	    push   ebx
   0x00000551 <+4>:	    sub    esp,0x404 
   <...SNIP...>
   0x00000580 <+51>:	leave  # <----------------------
   0x00000581 <+52>:	ret    # <--- Leave stack frame

Endianness

During load and save operations in registers and memories, the bytes are read in a different order. This byte order is called endianness. Endianness is distinguished between the little-endian format and the big-endian format.

Big-endian and little-endian are about the order of valence. In big-endian, the digits with the highest valence are initially. In little-endian, the digits with the lowest valence are at the beginning. Mainframe processors use the big-endian format, some RISC architectures, minicomputers, and in TCP/IP networks, the byte order is also in big-endian format.

Now, let us look at an example with the following values:

Address: 0xffff0000
Word: \xAA\xBB\xCC\xDD

Memory Address	0xffff0000	0xffff0001	0xffff0002	0xffff0003
Big-Endian	AA	BB	CC	DD
Little-Endian	DD	CC	BB	AA

This is very important for us to enter our code in the right order later when we have to tell the CPU to which address it should point.