ARM64 Assembly: A Practical Introduction for Beginners
  • Introduction
  • Chapter 1: Getting to Know ARM64
  • Chapter 2: Understanding ARM64 Registers
  • Chapter 3: Basic ARM64 Instructions
  • Chapter 4: The ARM Stack and Memory Concepts
  • Chapter 5:Introduction to Shellcode
Powered by GitBook
On this page
  • 2.1 Overview of ARM64 Registers
  • 2.2 Playing with Registers
  • 2.3 Simple Arithmetic with Registers
  • 2.4 Understanding Bigger Numbers in Registers
  • 2.5 W registers - X registers

Chapter 2: Understanding ARM64 Registers

2.1 Overview of ARM64 Registers

ARM64 (AArch64) architecture provides several general purpose registers, and also some special purpose registers that are used in different operations such as arithmetic, memory access and function or system call argument passing.

Registers

  • X0 to X7: Used for passing arguments to functions and system calls, and for holding return values.

  • X8: Often used for system call numbers.

  • X9 to X15: General-purpose registers for any use.

  • X16 and X17: Sometimes used as temporary registers by the linker.

  • X18: Platform register, reserved for platform-specific use (e.g., thread register in some environments).

  • X19 to X28: Callee-saved registers. Functions using these must save and restore their values.

  • X29 (FP): Frame pointer, used to maintain the stack frame.

  • X30 (LR): Link register, stores the return address for function calls.

  • SP: Stack pointer, points to the top of the stack.

  • PC: Program counter, holds the address of the next instruction to be executed.

  • PSTATE: Processor state register, holds status and control flags.

Detailed Explanation

X0 to X7

  • Accessible to All Instructions: Universally accessible for any operation.

  • Passing Arguments and Return Values: First eight arguments of a function call are passed in these registers. The return value of a function is typically stored in X0.

X8

  • System Call Number: Holds the syscall number during a system call.

X9 to X15

  • General Purpose: Used for temporary data storage during program execution.

X16 and X17

  • Temporary Registers: Used as scratch registers by the system linker during function calls.

X18

  • Platform Register: Reserved for platform-specific use, such as a thread-specific register in some operating systems.

X19 to X28

  • Callee-Saved Registers: Must be preserved across function calls. Functions using these registers must save their original values and restore them before returning.

X29 (FP)

  • Frame Pointer: Points to the start of the stack frame of a function. Helps in accessing function parameters and local variables.

X30 (LR)

  • Link Register: Holds the return address when a function call is made. Used to return to the correct location in the code after function execution.

SP (Stack Pointer)

  • Stack Pointer: Points to the top of the current stack, used for storing function parameters, local variables, and return addresses.

PC (Program Counter)

  • Program Counter: Holds the address of the next instruction to be executed by the CPU.

Context Switching and Register Usage

When different applications run on a system, each is given its own set of registers by the operating system through context switching. Here's how it works and how many registers are involved:

Context Switching

  • Context Switching: When the CPU switches from one application to another, it saves the current application's register state to memory and loads the next application's register state from memory. This ensures independent execution of each application without interference.

Register Usage in ARM64

  • 31 General-Purpose Registers: X0 to X30.

  • Special-Purpose Registers: SP (Stack Pointer) and PC (Program Counter).

Isolation Between Applications

  • Process Context: Each application has its own set of register values saved in its process control block (PCB) by the OS.

  • Virtual Memory: Each application runs in its own virtual memory space, ensuring isolation.

Understanding W and X Registers In AArch64, each general-purpose register has two names:

  • W Registers: Access only the lower 32 bits of the corresponding 64-bit registers X0-X30.

  • X Registers: Access the full 64 bits.

Example

; Using X registers (64-bit)
MOV X0, #0x1234567890ABCDEF   ; Move a 64-bit immediate value into X0
MOV X1, X0                    ; Move the 64-bit value from X0 to X1

; Using W registers (32-bit)
MOV W2, W0                    ; Move the lower 32 bits of X0 to W2
MOV W3, #0x12345678           ; Move a 32-bit immediate value into W3

PSTATE (Processor State) Register

The PSTATE register holds various status and control flags, crucial for managing and understanding the processor's state.

Understanding PSTATE

  • Condition Flags: Indicate the results of arithmetic operations.

  • Processor Mode: Indicates the current mode of the processor.

  • Interrupt Mask Bits: Control the handling of interrupts.

  • Execution State: Indicates the current execution state (AArch64 or AArch32).

Diagram Registers

AArch64 Registers
   _______________________________________________________
  | X0 | General        | W0     |                       |
  | X1 | purpose        | W1     |                       |
  | X2 | registers      | W2     |                       |
  | X3 | used for       | W3     |                       |
  | X4 | passing        | W4     |                       |
  | X5 | arguments      | W5     |                       |
  | X6 | and return     | W6     |                       |
  | X7 | values         | W7     |                       |
  |____|________________|________|_______________________|
  | X8 | Syscall number | W8     |                       |
  |____|________________|________|_______________________|
  | X9 |                | W9     |                       |
  | X10|                | W10    |                       |
  | X11|                | W11    |                       |
  | X12|                | W12    |                       |
  | X13|                | W13    |                       |
  | X14|                | W14    |                       |
  | X15|                | W15    |                       |
  |____|________________|________|_______________________|
  | X16| Temporary      | W16    |                       |
  | X17| registers      | W17    |                       |
  |____|________________|________|_______________________|
  | X18| Platform       | W18    |                       |
  |    | register       |        |                       |
  |____|________________|________|_______________________|
  | X19|                | W19    |                       |
  | X20|                | W20    |                       |
  | X21|                | W21    |                       |
  | X22|                | W22    |                       |
  | X23| Callee-saved   | W23    |                       |
  | X24| registers      | W24    |                       |
  | X25|                | W25    |                       |
  | X26|                | W26    |                       |
  | X27|                | W27    |                       |
  | X28|                | W28    |                       |
  |____|________________|________|_______________________|
  | X29| Frame pointer  | W29    |                       |
  |____|________________|________|_______________________|
  | X30| Link register  | W30    |                       |
  |____|________________|________|_______________________|
  | SP | Stack pointer  |        |                       |
  |____|________________|________|_______________________|
  | PC | Program        |        |                       |
  |    | counter        |        |                       |
  |____|________________|________|_______________________|
  | PSTATE | Processor State Register                   |
  |_____________________________________________________|

2.2 Playing with Registers

In ARM64, we have several types of registers, but we'll focus on the general-purpose registers for now. These are named X0 through X30, giving us 31 registers to work with. Each of these can hold a 64-bit value, which is just a fancy way of saying they can store really big numbers or long strings of characters.

Now, let's write our first program to play with registers!

In this section, we'll explore ARM64 registers hands-on, using a debugging approach that mimics remote debugging techniques. This method will give us a deep look into how registers work in a real ARM64 system.

Step 1 : Let's start by creating a simple assembly program. This program will demonstrate basic register operations.

Open your terminal and create a new file:

nano register_play.s

In the nano editor, type the following code:

.global _start
.section .text
_start:
    mov x0, #42        // Move the immediate value 42 into register x0
    mov x1, x0         // Copy the value from x0 to x1
    
    // Exit syscall
    mov x8, #93        // 93 is the syscall number for exit
    mov x0, #0         // 0 is the exit status
    svc #0             // Supervisor call to invoke the syscall

Let's break this down:

  • .global _start: This makes the _start symbol visible to the linker. It's our program's entry point.

  • .section .text: This indicates that the following code should be in the text section of the executable, which is where code typically goes.

  • mov x0, #42: This moves the immediate value 42 into the 64-bit register x0.

  • mov x1, x0: This copies the value from x0 into x1.

  • The last three lines set up and execute the exit syscall, which is how our program will cleanly terminate.

Step 2: Assembling and Linking

Now we'll turn our assembly code into an executable. This is a two-step process:

  1. Assemble the code:

    as register_play.s -o register_play.o

    This creates an object file register_play.o. The assembler translates our human-readable assembly into machine code.

  2. Link the object file:

    ld register_play.o -o register_play

    This creates our final executable register_play. The linker sets up the proper memory layout for our program.

Step 3: Setting Up the Debugger Listener

Now we'll use gdbserver to run our program in a way that allows a debugger to connect to it. This mimics debugging a program running on a remote machine.

gdbserver :1234 ./register_play

This command does the following:

  • gdbserver: This is the program that allows GDB to connect to a running process.

  • :1234: This tells gdbserver to listen on port 1234.

  • ./register_play: This is the program we want to debug.

The program is now waiting for a debugger to connect before it starts running.

Step 4: Connecting with GDB

Open a new terminal window. We'll use this to run GDB and connect to our waiting program.

Start GDB:

gdb

Once in GDB, tell it about our program and connect to the listener:

(gdb) file ./register_play
(gdb) target remote localhost:1234
  • file ./register_play: This loads the symbol table from our executable, helping GDB understand the structure of our program.

  • target remote localhost:1234: This connects GDB to the waiting gdbserver.

You should see output indicating a successful connection, and that the program is stopped at the entry point.

Step 5: Debugging Our Program

Now we're ready to examine our program in detail:

  1. View the current instruction:

    (gdb) x/1i $pc

    This should show the first mov instruction.

  2. Step to the next instruction:

    (gdb) stepi

    This executes the current instruction and moves to the next one.

  3. View register contents:

    (gdb) info registers x0 x1

    After the first stepi, you should see 42 in x0.

  4. View the next few instructions:

    (gdb) x/5i $pc

    This shows the next 5 instructions from the current point.

Let's step through the entire program:

  1. stepi (first mov instruction): After this, info registers x0 should show 42.

  2. stepi (second mov instruction): Now info registers x1 should also show 42.

  3. stepi (mov for syscall number): info registers x8 should show 93.

  4. stepi (mov for exit status): info registers x0 should now show 0.

  5. stepi (svc instruction): This will execute the syscall, and your program will exit.

At each step, use info registers to see how the register values change. This hands-on process lets you see exactly how each instruction affects the ARM64 registers.

Key Points to Remember:

  • The x registers are 64-bit general-purpose registers.

  • The mov instruction can move immediate values into registers or copy between registers.

  • System calls use specific registers for passing information (x8 for syscall number, x0 for first argument).

  • The svc instruction is used to make system calls.

Practice:

  1. Modify the program to use different numbers or different registers.

  2. Try adding more instructions and observe how they affect the registers.

2.3 Simple Arithmetic with Registers

In this section, we'll expand on our previous example by introducing a simple addition operation. This will help us understand how registers can be used for basic arithmetic.

Step 1: Writing Our Program

Create a new file named simple_add.s:

nano simple_add.s

Enter the following code:

.global _start
.section .text
_start:
    // Load values into registers
    mov x0, #5          // Put the number 5 into register x0
    mov x1, #3          // Put the number 3 into register x1

    // Add the values
    add x2, x0, x1      // Add x0 and x1, put the result in x2

    // Exit syscall
    mov x8, #93
    mov x0, #0
    svc #0

Let's break this down:

  1. We use mov to put two numbers (5 and 3) into registers x0 and x1.

  2. We introduce a new instruction: add. This adds the values in x0 and x1, and puts the result in x2.

  3. We keep the same exit syscall as before.

Step 2: Assemble and Link

Assemble and link the program:

as simple_add.s -o simple_add.o
ld simple_add.o -o simple_add

Step 3: Debug with GDB

Now, let's use GDB to see how this program works:

  1. Start gdbserver:

    gdbserver :1234 ./simple_add
  2. In another terminal, start GDB and connect:

    gdb
    (gdb) file ./simple_add
    (gdb) target remote localhost:1234    

  3. Let's step through the program:

    (gdb) stepi
    (gdb) info registers x0

    After the first mov, x0 should contain 5.

    (gdb) stepi
    (gdb) info registers x1

    After the second mov, x1 should contain 3.

    (gdb) stepi
    (gdb) info registers x2

    After the add instruction, x2 should contain 8 (5 + 3).

Step 4: Understanding What Happened

  1. We used mov to put values directly into registers x0 and x1.

  2. The add instruction took the values from x0 and x1, added them together, and put the result in x2.

  3. We can see the result of the addition by looking at the contents of x2.

Similarly we can do Arithmetic Operations in assesmbly.

2.4 Understanding Bigger Numbers in Registers

In ARM64 assembly programming, working with larger numbers is a crucial skill. This section will explore how ARM64 registers handle bigger numbers and introduce you to the 'movz' instruction, which is essential for this purpose.

The Nature of ARM64 Registers

ARM64 architecture provides 31 general-purpose registers, labeled x0 through x30. Each of these registers is 64 bits wide. To understand what this means, let's break it down:

  • A bit is the smallest unit of data in computing, representing either 0 or 1.

  • 8 bits make a byte, which can represent numbers from 0 to 255.

  • 64 bits give us an enormous range: from 0 to 18,446,744,073,709,551,615.

Visualizing a 64-bit Register: [xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx] Each 'x' represents a bit that can be either 0 or 1.

The Limitation of 'mov'

In our earlier programs, we used the 'mov' instruction to put numbers into registers. However, 'mov' has a significant limitation:

  • It can only directly load immediate values up to 16 bits.

  • This means the largest number you can load with 'mov' is 65,535 (2^16 - 1).

For example:

mov x0, #65535    // This works
mov x0, #65536    // This will cause an error

Introducing 'movz'

To work with bigger numbers, ARM64 provides the 'movz' instruction:

  • 'movz' stands for "move and zero".

  • It allows us to load a 16-bit value into a specific part of the 64-bit register.

  • The rest of the register bits are set to zero.

The syntax of 'movz' is:

movz xd, #imm{, lsl #shift}

Where:

  • xd is the destination register

  • imm is a 16-bit immediate value (0-65535)

  • lsl #shift is optional and can be 0, 16, 32, or 48

How 'movz' Works

To understand 'movz', let's visualize a 64-bit register as four 16-bit sections:

[16 bits][16 bits][16 bits][16 bits]

'movz' allows us to load a value into one of these sections and set the rest to zero. The 'lsl' (logical shift left) option determines which section we're filling:

  • No lsl or lsl #0: Fills the rightmost 16 bits

  • lsl #16: Fills the second section from the right

  • lsl #32: Fills the third section from the right

  • lsl #48: Fills the leftmost 16 bits

Example: Using 'movz'

Let's load the number 1234 (0x04D2 in hexadecimal) into register x0:

movz x0, #1234

What happens in the register:

Before: 0000000000000000000000000000000000000000000000000000000000000000 After: 0000000000000000000000000000000000000000000000000000010011010010 ^ 1234 in binary

In hexadecimal: Before: 0x0000000000000000 After: 0x00000000000004D2

Practical Application

Let's use 'movz' in a small program:

.global _start
.section .text
_start:
    movz x0, #1234        // Load 1234 into x0
    movz x1, #5678        // Load 5678 into x1
    add x2, x0, x1        // Add x0 and x1, store result in x2
    
    // Exit syscall
    mov x8, #93
    mov x0, #0
    svc #0

  1. .global _start: This makes the _start symbol visible to the linker. It's our program's entry point.

  2. .section .text: This indicates that the following code should be in the text section of the executable, which is where code typically goes.

  3. _start:: This is the label for our entry point.

Now, let's focus on the movz instructions and how they work at the bit level:

  1. movz x0, #1234

    This instruction loads the value 1234 into register x0. Let's see how this looks in binary:

    1234 in binary is: 0000 0100 1101 0010

    The movz instruction will place this in the least significant 16 bits of the 64-bit register x0:

    x0 before: 0000000000000000000000000000000000000000000000000000000000000000 x0 after: 0000000000000000000000000000000000000000000000000100110100100000

    In hexadecimal, this is: 0x00000000000004D2

  2. movz x1, #5678

    Similarly, this loads 5678 into x1. 5678 in binary is: 0001 0110 0011 1110

    x1 before: 0000000000000000000000000000000000000000000000000000000000000000 x1 after: 0000000000000000000000000000000000000000000000010110001111100000

    In hexadecimal, this is: 0x000000000000162E

Now, let's debug this program and observe these changes:

(gdb) break _start
(gdb) run
(gdb) info registers x0 x1
x0             0x0                 0
x1             0x0                 0

(gdb) stepi
(gdb) info registers x0
x0             0x4d2               1234

(gdb) stepi
(gdb) info registers x1
x1             0x162e              5678

(gdb) stepi
(gdb) info registers x2
x2             0x1b00              6912

Explanation of movz operation:

  • movz takes a 16-bit immediate value and places it in the least significant 16 bits of the destination register.

  • It simultaneously sets all other bits in the register to zero.

  • This is why it's called "move and zero" - it moves the immediate value and zeros out the rest.

  • The add instruction then performs a 64-bit addition of these values, even though we only used the lower 16 bits of each register.

  • The exit syscall at the end (using mov, svc) is a standard way to end the program cleanly in Linux ARM64 assembly.

Let's create a program store_64bit.s that demonstrates how to store a 64-bit value

.global _start
.section .text
_start:
    // Load 0x1234567890ABCDEF into x0
    movz x0, #0xCDEF, lsl #0
    movk x0, #0x90AB, lsl #16
    movk x0, #0x5678, lsl #32
    movk x0, #0x1234, lsl #48

    // Exit syscall
    mov x8, #93
    mov x0, #0
    svc #0

Let's break this down and debug it step-by-step:

  1. Assemble and link the program:

    as store_64bit.s -o store_64bit.o
    ld store_64bit.o -o store_64bit
  2. Start debugging:

    gdb ./store_64bit
  3. In GDB

Now, let's examine each instruction:

  1. movz x0, #0xCDEF, lsl #0

    (gdb) stepi
    (gdb) info registers x0

    You should see:

    x0             0x000000000000cdef                52719
     

Explanation:

  • This loads 0xCDEF into the least significant 16 bits of x0.

  • lsl #0 means no shift, so it goes into the rightmost position.

  • Binary: 0000000000000000000000000000000000000000000000001100110111101111

  1. movk x0, #0x90AB, lsl #16

    (gdb) stepi
    (gdb) info registers x0

    You should see:

    x0             0x0000000090abcdef                2427130351

    Explanation:

    • movk keeps the existing bits in x0 and only modifies the specified 16-bit section.

    • lsl #16 shifts the value 16 bits to the left before inserting.

    • Binary: 0000000000000000100100001010101111001101111011111

  2. movk x0, #0x5678, lsl #32

    (gdb) stepi
    (gdb) info registers x0

    You should see:

    x0             0x0000567890abcdef                24197857645039

    Explanation:

    • This inserts 0x5678 into the third 16-bit section from the right.

    • Binary: 0000000001010110011110001001000010101011110011011110111

  3. movk x0, #0x1234, lsl #48

    (gdb) stepi
    (gdb) info registers x0

    You should see:

    x0             0x1234567890abcdef                1311768467463790319

    Explanation:

    • This completes our 64-bit value by inserting 0x1234 into the leftmost 16 bits.

    • Final binary: 0001001000110100010101100111100010010000101010111100110111101111

Key Points:

  • movz is used for the first operation because it zeroes out the entire register before inserting the value.

  • Subsequent operations use movk (move and keep) to preserve the bits we've already set.

  • The lsl parameter determines which 16-bit section of the 64-bit register we're modifying:

    • lsl #0: Bits 0-15

    • lsl #16: Bits 16-31

    • lsl #32: Bits 32-47

    • lsl #48: Bits 48-63

This technique allows us to construct any 64-bit value in a register, piece by piece. It's particularly useful when working with large constants or memory addresses that can't be loaded with a single instruction.

2.5 W registers - X registers

Let's represent a 64-bit X register as a series of 64 bits:

X0: [63 62 61 60 59 58 57 56|55 54 53 52 51 50 49 48|47 46 45 44 43 42 41 40|39 38 37 36 35 34 33 32|31 30 29 28 27 26 25 24|23 22 21 20 19 18 17 16|15 14 13 12 11 10 9 8|7 6 5 4 3 2 1 0]

Now, the corresponding W register (W0) is actually just the lower 32 bits of X0:

W0: [31 30 29 28 27 26 25 24|23 22 21 20 19 18 17 16|15 14 13 12 11 10 9 8|7 6 5 4 3 2 1 0]

To illustrate this relationship:

  1. When you modify W0, you're only changing the lower 32 bits of X0.

  2. When you read from W0, you're only reading the lower 32 bits of X0.

  3. The upper 32 bits of X0 remain unchanged when you work with W0.

Let's demonstrate this with a practical example:

movz x0, #0xABCD, lsl #48  // X0 = 0xABCD000000000000
movz w0, #0x1234           // X0 = 0x0000000000001234

After the first instruction: X0: [1010 1011 1100 1101|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000] W0: [0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000]

After the second instruction: X0: [0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000] W0: [0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000]

Notice how modifying W0 changed the lower 32 bits of X0, but the upper 32 bits were set to zero.

Here's another example to illustrate:

movz x0, #0xABCD, lsl #48  // X0 = 0xABCD000000000000
movk w0, #0x1234           // X0 = 0x0000000000001234

After these instructions: X0: [0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000] W0: [0000 0000 0000 0000|0000 0000 0000 0000|0001 0010 0011 0100|0000 0000 0000 0000]

In this case, modifying W0 not only changed the lower 32 bits but also zeroed out the upper 32 bits of X0.

This relationship between X and W registers allows for efficient 32-bit operations when 64-bit precision isn't needed, while still providing access to full 64-bit functionality when required.

Let's create a simple program to demonstrate:

.global _start
.section .text
_start:
    movz x0, #0xABCD, lsl #48  // Put ABCD in the top 16 bits
    movz w0, #0x1234           // Put 1234 in the bottom 16 bits

    // Exit syscall
    mov x8, #93
    mov x0, #0
    svc #0

Now, let's debug this step by step:

  1. Assemble and link:

    as register_demo.s -o register_demo.o
    ld register_demo.o -o register_demo
  2. Start GDB:

    gdb ./register_demo
  3. Set breakpoint and run:

    (gdb) break _start
    (gdb) run
  4. Now, let's examine each step:

    After movz x0, #0xABCD, lsl #48:

    (gdb) stepi
    (gdb) p/x $x0
    $1 = 0xabcd000000000000
    (gdb) p/x $w0
    $2 = 0x0

    After movz w0, #0x1234:

    (gdb) stepi
    (gdb) p/x $x0
    $3 = 0x1234
    (gdb) p/x $w0
    $4 = 0x1234

What's happening here:

  1. When we use x0, we're working with all 64 bits of the register.

  2. When we use w0, we're only working with the lower 32 bits of the same register.

  3. When we modify w0, it affects the lower 32 bits of x0, and zeros out the upper 32 bits.

The key points:

  • x0 and w0 are not separate registers. They're two ways of accessing the same physical register.

  • x0 gives you access to all 64 bits.

  • w0 gives you access to only the lower 32 bits.

  • When you write to w0, you're automatically setting the upper 32 bits of x0 to zero.

Think of it like this:

  • x0 is a 64-bit box: [-------------------- 64 bits --------------------]

  • w0 is just the right half of that box: [32 bits][---- 32 bits ----]

When you use w0, you're only looking at and modifying that right half. The left half gets set to zero automatically when you modify w0.

This setup allows for efficient 32-bit operations when you don't need the full 64-bit range, while still providing access to the full 64-bit register when needed.

PreviousChapter 1: Getting to Know ARM64NextChapter 3: Basic ARM64 Instructions

Last updated 10 months ago