Learning Assembly Language Concepts

Assembly language is really cool but it can also be maddening. I took a course on assembly last semester and Id like to introduce a few of the basic concepts in assembly here. This isnt a full blown tutorial, but just a bit to whet your appetite.

Lets start with the idea of a register. Registers are fixed-size regions of memory storage inside the CPU. Theyre used to hold instructions and data that pass through the processor as the program is executed.

In programming languages like C, Objective-C, or JavaScript, you define variables with some combination of a keyword and a name which you make up. In assembly, all of the registers have names that are already defined.

Depending on the environment (x86, or ARM) you will have different names for the registers. In x86 assembly, You have register names like eax, ebx, ecx and so on. In ARM assembly, youll see names like r0, r1, and r2.

Theres also a register that points to the address of the stack. In ARM, its called sp. Thats short for stack pointer. (Get used to those abbreviations because assembly is all about being terse.)

Another register you should know about is the flags register. Flags have either a value of 0 or 1 and indicate something about the state of the machine. Well talk about this again in a bit. First, lets talk about assembly instructions.

Which register you should use for what is a matter of convention, and that becomes helpful when debugging (and reverse engineering.) Different hardware platforms and operating systems have their own conventions.

Instructions in assembly are composed of an opcode and then one, two, or three, addresses or values. For example, to place the value 5 into the first register, we might use something like mov r0, 5. This is the same as int r0 = 5; in C. (Except we would likely call r0 something more context-appropriate.)

The opcode (short for operation code) is a mnemonic, usually 3-4 characters, and it always appears first on the line. After the opcode is the destination register, or the place we put the result of the operation. Then we have the value or values being operated on. In this case, the destination is r0 and the value is 5.

There are a few kinds of things we can put after the opcode and the destination register.

A value, such as the 5 we used here.
The name of another register, such as r0 or eax, depending on the platform.
The address of another register or memory location. We do this by enclosing a value in square brackets. [r0] means r0 contains the address of another register This is like pointers in C. (Read up on addressing modes in assembly for more on this.

In high level languages, we have functions and methods. In assembly, we have labels to define routines and functions. (Routines dont return anything, while functions do.) To define a label, type it out on its own line, followed by a colon. For example ifblock: is a named block that I can call later.

Recall that assembly programs are read top down. Because of this, we need some way to implement common control flow constructs that we are used to from high level languages.

Remember the flags register we talked about? With the right flags and using labels, we can control the flow of our program quite easily. We use the cmp instruction to compare a register to a value, and doing so will modify the zero bit in the flags register. Depending on its value, we can jump to the desired label.

For examples, check out my assembly homeworks from last semester. There is also an amazing blog post on Coranac.com that goes into way more detail than I do here. Finally, there are some fantastic books on Amazon.