Setup

  • Install VirtualBox
  • Install Ubuntu 12.04

What is Assembly Langauge?

  • What is assembly languge? -> The language that the processor understands.
  • Assembler + Linker convert assembly to binary.
    • Compiler converts code to ASM.
    • Assembler converts to obj file
    • Linker takes obj and other libs and creates the executable.
  • install dependencies sudo apt install nasm build-essentials
  • nasm converts ASM code to executable. We must specify the output format using -f elf64 to produce 64bit obj file.
  • Example syntax nasm -f elf64 filename.nasm -o output.o
  • Link the file using the linker ld , syntax -> ld output.o -o output
  • We can disassemle the obj file, output.o using objdump -D -M intel output.o

CPU Information

  • D/F processors understand D/F assembly langauge
  • lscpu or cat /proc/cpuinfo lists info about processor.
  • We will use gdb a lot during the course.

CPU Registers

  • 16 GPR
  • Just for RAX, RBX, RCX, RDX we can acccess the higher 8 bits of the lower 16 Bits as well. ( Example AH in case of AX)
  • Another restriction is an instruction cannot reference legacy high bytes like AH and low byte of RAX at the same time. However, it may reference legacy low bytes registers like AL.
  • Flags indicate various "events" when execution is happening.
  • RIP is the instuction pointer. Supports relative addressing.

Hello World ASM

  • .text section has code
  • .data section has strings or data
  • .bss section as uninitialized data
  • global directive tells where to start
  • syscalls enable userspace programs to execute kernelspace functions
  • Calling convention
    • RAX -> Syscall Number
    • RDI -> 1st Argument
    • RSI -> 2nd Argument
    • RDX -> 3rd Argument
    • R10 -> 4th Argument
    • R8 -> 5th Argument
    • R9 -> 6th Argument
  • compile the assebly code using nasm -f elf64 filename.nasm -o output.o
  • link it using ld output.o -o output

Reducing Instruction Size and Removing Nulls

  • Our aim is to build shellcode. Hence, we need the smallest possible shellcode with same functionality.
  • Refer to legacy low bytes.
  • XOR registers to make them 0. This makes our shellcode safe as we do not know in what context our shellcode will run (Values of Registers).

Data Types

  • 8 bits = 1 byte

  • 16 bits = 2 bytes = Word

  • 32 bits = 4 bytes = Double Word

  • 64 bits = 8 bytes = Quad Word

  • 128 Bts = 16 bytes = Double Quad Word

  • Access memory Reference with []. The content at the address get referred.

Endianess

  • Big Endian -> Higher Byte First
  • Little Endian -> Lower Byte First

GDB TUI MODE

  • gdb -tui
  • layout asm
  • layout regs

Moving Data

  • 64 bits operands generate 64 bits result.
  • 32 bits operands generate 32 bits results, zero extended to 64 result in GPR.
  • 8/16bits operands generate 8/16 bits result. The upper 56/48 bits are not modified by the registers.

MOV is the most common instruction. Allowed Directions:

  • Between Registers
  • Memory to Register and Register to Memory
  • Immediate Data to Register
  • Immediate Data to Memory

LEA is load effective Address. For example LEA RAX, [lable] loads the data which label references.

XCHG swaps values. For example XCHG Reg, Reg or XCHG Reg, Memory

The Stack

  • LIFO
  • RSP Points to top of the stack.
  • It grows from High Memory to Low Memory.
  • So, when we push data to stack, RSP becomes RSP-8
  • So, when we pop data to stack, RSP becomes RSP+8

Arithmetic Operations

  • ADD
  • ADC (Add with carry)
  • SUB
  • SBB (Sub with borrow)
  • INC
  • DEC

CLC clears the carry flag. STC sets the carry flag.

Logical Operations

  • AND ( Output 1 if both are 1)
  • OR ( Output 1 if any one is 1)
  • XOR ( output 1 if both same else 0 )
  • NOT ( Opposite output)

Bit Shifting Instructions

  • SHR
  • SHL
  • SAR -> In case of negatice operand, a 1 is filled on the left most place
  • SAL

Rotate Instructions

  • ROR -> Rotate Right
  • ROL -> Rotate Left
  • RCL -> The msb is shifted into cf. cf is shifted into the lsb.
  • RCR -> The lsb is shifted into cf. cf is shifted into the msb.

Control Instruction

  • Make use of flags to make decision or take branch.
  • Unconditional Jumps
  • Conditional Jumps -> Several options. Check Intel Manual.

LOOPS

  • Uses ECX reg.
  • When ECX == 0, loop ends.
  • Important to preserve ECX.
  • LOOPE -> Decrements ecx and checks that ecx is not zero and ZF is set, else loop ends.
  • LOOPNE -> Decrements ecx and checks that ecx is not zero and ZF is NOT set, else loop ends.
  • SYSCALLS may change the context. Preserve the state.

Procedure

  • RET signifies end of procedure.
  • CALL pushes the address of next instruction on the stack.
  • RET pops back the address in RIP

Stack Frame Procedure

The following instructions are executed at the beginning of a procedure. They preserve the rsp and rbp.

push rbp
mov rbp,rsp

At the end of the function, there is a leave instruction, which is equivalent of following snippet is executed, restoring the previous state of the stack.

mov rsp, rbp
pop rbp

Scanning and Comparing Strings in 64bit ASM

  • SCASB/SCASW/SCASD/SCASQ -> Compare with AL/AX/EAX/RAX. Comparison is memory to Register.

  • CMPSB/CMPSW/CMPSD/CMPSQ -> Comparison is memory to memory. Source referenced by RSI and destination by RDI .

  • ZeroFlag gets set if comparison is successful.

  • CLD -> Clears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index registers (ESI and/or EDI).

Load Store Move Strings

  • Load (Memory to register)
    • loadsb/loadsq/loadsd/loadsq
  • Store (Register to Memory)
    • stosb/stosw/stosd/stosq
  • Move (Memoy to Memory)
    • movsb/movsw/movsd/movsq

Move : Source is RSI, destination is RDI.
Direction flag dictates the direction in which copy happens. If DF is cleared, the addresses are incremented.