Chapter 2: Understanding ARM64 Registers
2.1 Overview of ARM64 Registers
ARM64 (AArch64) architecture provides several general purpose registers, and also some special purpose registers that are used in different operations such as arithmetic, memory access and function or system call argument passing.
Registers
X0 to X7: Used for passing arguments to functions and system calls, and for holding return values.
X8: Often used for system call numbers.
X9 to X15: General-purpose registers for any use.
X16 and X17: Sometimes used as temporary registers by the linker.
X18: Platform register, reserved for platform-specific use (e.g., thread register in some environments).
X19 to X28: Callee-saved registers. Functions using these must save and restore their values.
X29 (FP): Frame pointer, used to maintain the stack frame.
X30 (LR): Link register, stores the return address for function calls.
SP: Stack pointer, points to the top of the stack.
PC: Program counter, holds the address of the next instruction to be executed.
PSTATE: Processor state register, holds status and control flags.
Detailed Explanation
X0 to X7
Accessible to All Instructions: Universally accessible for any operation.
Passing Arguments and Return Values: First eight arguments of a function call are passed in these registers. The return value of a function is typically stored in X0.
X8
System Call Number: Holds the syscall number during a system call.
X9 to X15
General Purpose: Used for temporary data storage during program execution.
X16 and X17
Temporary Registers: Used as scratch registers by the system linker during function calls.
X18
Platform Register: Reserved for platform-specific use, such as a thread-specific register in some operating systems.
X19 to X28
Callee-Saved Registers: Must be preserved across function calls. Functions using these registers must save their original values and restore them before returning.
X29 (FP)
Frame Pointer: Points to the start of the stack frame of a function. Helps in accessing function parameters and local variables.
X30 (LR)
Link Register: Holds the return address when a function call is made. Used to return to the correct location in the code after function execution.
SP (Stack Pointer)
Stack Pointer: Points to the top of the current stack, used for storing function parameters, local variables, and return addresses.
PC (Program Counter)
Program Counter: Holds the address of the next instruction to be executed by the CPU.
Context Switching and Register Usage
When different applications run on a system, each is given its own set of registers by the operating system through context switching. Here's how it works and how many registers are involved:
Context Switching
Context Switching: When the CPU switches from one application to another, it saves the current application's register state to memory and loads the next application's register state from memory. This ensures independent execution of each application without interference.
Register Usage in ARM64
31 General-Purpose Registers: X0 to X30.
Special-Purpose Registers: SP (Stack Pointer) and PC (Program Counter).
Isolation Between Applications
Process Context: Each application has its own set of register values saved in its process control block (PCB) by the OS.
Virtual Memory: Each application runs in its own virtual memory space, ensuring isolation.
Understanding W and X Registers In AArch64, each general-purpose register has two names:
W Registers: Access only the lower 32 bits of the corresponding 64-bit registers X0-X30.
X Registers: Access the full 64 bits.
Example
PSTATE (Processor State) Register
The PSTATE register holds various status and control flags, crucial for managing and understanding the processor's state.
Understanding PSTATE
Condition Flags: Indicate the results of arithmetic operations.
Processor Mode: Indicates the current mode of the processor.
Interrupt Mask Bits: Control the handling of interrupts.
Execution State: Indicates the current execution state (AArch64 or AArch32).
Diagram Registers
2.2 Playing with Registers
In ARM64, we have several types of registers, but we'll focus on the general-purpose registers for now. These are named X0 through X30, giving us 31 registers to work with. Each of these can hold a 64-bit value, which is just a fancy way of saying they can store really big numbers or long strings of characters.
Now, let's write our first program to play with registers!
In this section, we'll explore ARM64 registers hands-on, using a debugging approach that mimics remote debugging techniques. This method will give us a deep look into how registers work in a real ARM64 system.
Step 1 : Let's start by creating a simple assembly program. This program will demonstrate basic register operations.
Open your terminal and create a new file:
In the nano editor, type the following code:
Let's break this down:
.global _start
: This makes the_start
symbol visible to the linker. It's our program's entry point..section .text
: This indicates that the following code should be in the text section of the executable, which is where code typically goes.mov x0, #42
: This moves the immediate value 42 into the 64-bit register x0.mov x1, x0
: This copies the value from x0 into x1.The last three lines set up and execute the exit syscall, which is how our program will cleanly terminate.
Step 2: Assembling and Linking
Now we'll turn our assembly code into an executable. This is a two-step process:
Assemble the code:
This creates an object file
register_play.o
. The assembler translates our human-readable assembly into machine code.Link the object file:
This creates our final executable
register_play
. The linker sets up the proper memory layout for our program.
Step 3: Setting Up the Debugger Listener
Now we'll use gdbserver
to run our program in a way that allows a debugger to connect to it. This mimics debugging a program running on a remote machine.
This command does the following:
gdbserver
: This is the program that allows GDB to connect to a running process.:1234
: This tells gdbserver to listen on port 1234../register_play
: This is the program we want to debug.
The program is now waiting for a debugger to connect before it starts running.
Step 4: Connecting with GDB
Open a new terminal window. We'll use this to run GDB and connect to our waiting program.
Start GDB:
Once in GDB, tell it about our program and connect to the listener:
file ./register_play
: This loads the symbol table from our executable, helping GDB understand the structure of our program.target remote localhost:1234
: This connects GDB to the waiting gdbserver.
You should see output indicating a successful connection, and that the program is stopped at the entry point.
Step 5: Debugging Our Program
Now we're ready to examine our program in detail:
View the current instruction:
This should show the first
mov
instruction.Step to the next instruction:
This executes the current instruction and moves to the next one.
View register contents:
After the first
stepi
, you should see 42 in x0.View the next few instructions:
This shows the next 5 instructions from the current point.
Let's step through the entire program:
stepi
(first mov instruction): After this,info registers x0
should show 42.stepi
(second mov instruction): Nowinfo registers x1
should also show 42.stepi
(mov for syscall number):info registers x8
should show 93.stepi
(mov for exit status):info registers x0
should now show 0.stepi
(svc instruction): This will execute the syscall, and your program will exit.
At each step, use info registers
to see how the register values change. This hands-on process lets you see exactly how each instruction affects the ARM64 registers.
Key Points to Remember:
The
x
registers are 64-bit general-purpose registers.The
mov
instruction can move immediate values into registers or copy between registers.System calls use specific registers for passing information (x8 for syscall number, x0 for first argument).
The
svc
instruction is used to make system calls.
Practice:
Modify the program to use different numbers or different registers.
Try adding more instructions and observe how they affect the registers.
2.3 Simple Arithmetic with Registers
In this section, we'll expand on our previous example by introducing a simple addition operation. This will help us understand how registers can be used for basic arithmetic.
Step 1: Writing Our Program
Create a new file named simple_add.s
:
Enter the following code:
Let's break this down:
We use
mov
to put two numbers (5 and 3) into registers x0 and x1.We introduce a new instruction:
add
. This adds the values in x0 and x1, and puts the result in x2.We keep the same exit syscall as before.
Step 2: Assemble and Link
Assemble and link the program:
Step 3: Debug with GDB
Now, let's use GDB to see how this program works:
Start gdbserver:
In another terminal, start GDB and connect:
Let's step through the program:
After the first
mov
, x0 should contain 5.After the second
mov
, x1 should contain 3.After the
add
instruction, x2 should contain 8 (5 + 3).
Step 4: Understanding What Happened
We used
mov
to put values directly into registers x0 and x1.The
add
instruction took the values from x0 and x1, added them together, and put the result in x2.We can see the result of the addition by looking at the contents of x2.
Similarly we can do Arithmetic Operations in assesmbly.
2.4 Understanding Bigger Numbers in Registers
In ARM64 assembly programming, working with larger numbers is a crucial skill. This section will explore how ARM64 registers handle bigger numbers and introduce you to the 'movz' instruction, which is essential for this purpose.
The Nature of ARM64 Registers
ARM64 architecture provides 31 general-purpose registers, labeled x0 through x30. Each of these registers is 64 bits wide. To understand what this means, let's break it down:
A bit is the smallest unit of data in computing, representing either 0 or 1.
8 bits make a byte, which can represent numbers from 0 to 255.
64 bits give us an enormous range: from 0 to 18,446,744,073,709,551,615.
Visualizing a 64-bit Register: [xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx] Each 'x' represents a bit that can be either 0 or 1.
The Limitation of 'mov'
In our earlier programs, we used the 'mov' instruction to put numbers into registers. However, 'mov' has a significant limitation:
It can only directly load immediate values up to 16 bits.
This means the largest number you can load with 'mov' is 65,535 (2^16 - 1).
For example:
Introducing 'movz'
To work with bigger numbers, ARM64 provides the 'movz' instruction:
'movz' stands for "move and zero".
It allows us to load a 16-bit value into a specific part of the 64-bit register.
The rest of the register bits are set to zero.
The syntax of 'movz' is:
Where:
xd is the destination register
imm is a 16-bit immediate value (0-65535)
lsl #shift is optional and can be 0, 16, 32, or 48
How 'movz' Works
To understand 'movz', let's visualize a 64-bit register as four 16-bit sections:
[16 bits][16 bits][16 bits][16 bits]
'movz' allows us to load a value into one of these sections and set the rest to zero. The 'lsl' (logical shift left) option determines which section we're filling:
No lsl or lsl #0: Fills the rightmost 16 bits
lsl #16: Fills the second section from the right
lsl #32: Fills the third section from the right
lsl #48: Fills the leftmost 16 bits
Example: Using 'movz'
Let's load the number 1234 (0x04D2 in hexadecimal) into register x0:
What happens in the register:
Before: 0000000000000000000000000000000000000000000000000000000000000000 After: 0000000000000000000000000000000000000000000000000000010011010010 ^ 1234 in binary
In hexadecimal: Before: 0x0000000000000000 After: 0x00000000000004D2
Practical Application
Let's use 'movz' in a small program:
.global _start
: This makes the_start
symbol visible to the linker. It's our program's entry point..section .text
: This indicates that the following code should be in the text section of the executable, which is where code typically goes._start:
: This is the label for our entry point.
Now, let's focus on the movz
instructions and how they work at the bit level:
movz x0, #1234
This instruction loads the value 1234 into register x0. Let's see how this looks in binary:
1234 in binary is: 0000 0100 1101 0010
The
movz
instruction will place this in the least significant 16 bits of the 64-bit register x0:x0 before: 0000000000000000000000000000000000000000000000000000000000000000 x0 after: 0000000000000000000000000000000000000000000000000100110100100000
In hexadecimal, this is: 0x00000000000004D2
movz x1, #5678
Similarly, this loads 5678 into x1. 5678 in binary is: 0001 0110 0011 1110
x1 before: 0000000000000000000000000000000000000000000000000000000000000000 x1 after: 0000000000000000000000000000000000000000000000010110001111100000
In hexadecimal, this is: 0x000000000000162E
Now, let's debug this program and observe these changes:
Explanation of movz
operation:
movz
takes a 16-bit immediate value and places it in the least significant 16 bits of the destination register.It simultaneously sets all other bits in the register to zero.
This is why it's called "move and zero" - it moves the immediate value and zeros out the rest.
The
add
instruction then performs a 64-bit addition of these values, even though we only used the lower 16 bits of each register.The exit syscall at the end (using
mov
,svc
) is a standard way to end the program cleanly in Linux ARM64 assembly.
Let's create a program store_64bit.s that demonstrates how to store a 64-bit value
Let's break this down and debug it step-by-step:
Assemble and link the program:
Start debugging:
In GDB
Now, let's examine each instruction:
movz x0, #0xCDEF, lsl #0
You should see:
Explanation:
This loads 0xCDEF into the least significant 16 bits of x0.
lsl #0
means no shift, so it goes into the rightmost position.Binary: 0000000000000000000000000000000000000000000000001100110111101111
movk x0, #0x90AB, lsl #16
You should see:
Explanation:
movk
keeps the existing bits in x0 and only modifies the specified 16-bit section.lsl #16
shifts the value 16 bits to the left before inserting.Binary: 0000000000000000100100001010101111001101111011111
movk x0, #0x5678, lsl #32
You should see:
Explanation:
This inserts 0x5678 into the third 16-bit section from the right.
Binary: 0000000001010110011110001001000010101011110011011110111
movk x0, #0x1234, lsl #48
You should see:
Explanation:
This completes our 64-bit value by inserting 0x1234 into the leftmost 16 bits.
Final binary: 0001001000110100010101100111100010010000101010111100110111101111
Key Points:
movz
is used for the first operation because it zeroes out the entire register before inserting the value.Subsequent operations use
movk
(move and keep) to preserve the bits we've already set.The
lsl
parameter determines which 16-bit section of the 64-bit register we're modifying:lsl #0
: Bits 0-15lsl #16
: Bits 16-31lsl #32
: Bits 32-47lsl #48
: Bits 48-63
This technique allows us to construct any 64-bit value in a register, piece by piece. It's particularly useful when working with large constants or memory addresses that can't be loaded with a single instruction.
2.5 W registers - X registers
Let's represent a 64-bit X register as a series of 64 bits:
X0: [63 62 61 60 59 58 57 56|55 54 53 52 51 50 49 48|47 46 45 44 43 42 41 40|39 38 37 36 35 34 33 32|31 30 29 28 27 26 25 24|23 22 21 20 19 18 17 16|15 14 13 12 11 10 9 8|7 6 5 4 3 2 1 0]
Now, the corresponding W register (W0) is actually just the lower 32 bits of X0:
W0: [31 30 29 28 27 26 25 24|23 22 21 20 19 18 17 16|15 14 13 12 11 10 9 8|7 6 5 4 3 2 1 0]
To illustrate this relationship:
When you modify W0, you're only changing the lower 32 bits of X0.
When you read from W0, you're only reading the lower 32 bits of X0.
The upper 32 bits of X0 remain unchanged when you work with W0.
Let's demonstrate this with a practical example:
After the first instruction: X0: [1010 1011 1100 1101|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000] W0: [0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000]
After the second instruction: X0: [0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000] W0: [0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000]
Notice how modifying W0 changed the lower 32 bits of X0, but the upper 32 bits were set to zero.
Here's another example to illustrate:
After these instructions: X0: [0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000] W0: [0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000]
In this case, modifying W0 not only changed the lower 32 bits but also zeroed out the upper 32 bits of X0.
This relationship between X and W registers allows for efficient 32-bit operations when 64-bit precision isn't needed, while still providing access to full 64-bit functionality when required.
Let's create a simple program to demonstrate:
Now, let's debug this step by step:
Assemble and link:
Start GDB:
Set breakpoint and run:
Now, let's examine each step:
After
movz x0, #0xABCD, lsl #48
:After
movz w0, #0x1234
:
What's happening here:
When we use
x0
, we're working with all 64 bits of the register.When we use
w0
, we're only working with the lower 32 bits of the same register.When we modify
w0
, it affects the lower 32 bits ofx0
, and zeros out the upper 32 bits.
The key points:
x0
andw0
are not separate registers. They're two ways of accessing the same physical register.x0
gives you access to all 64 bits.w0
gives you access to only the lower 32 bits.When you write to
w0
, you're automatically setting the upper 32 bits ofx0
to zero.
Think of it like this:
x0
is a 64-bit box: [-------------------- 64 bits --------------------]w0
is just the right half of that box: [32 bits][---- 32 bits ----]
When you use w0
, you're only looking at and modifying that right half. The left half gets set to zero automatically when you modify w0
.
This setup allows for efficient 32-bit operations when you don't need the full 64-bit range, while still providing access to the full 64-bit register when needed.
Last updated