Recently I decided to learn assembly. I already had a reasonable understanding of how it worked due to some classes that touched the subject in university, however I never had the opportunity to really write assembly code.
Since my everyday computer is an x86-64 machine, it made most sense to learn assembly for this architecture, so I could avoid the need for a VM. I started with only the desire to get my hands dirty with assembly code, and not any particular objective or project.
At first I was alternating between trying things out and researching on the web just to understand enough to get a bare minimum assembly file and commands that would assemble it and run. Eventually I stumbled upon the book that would guide me: x86-64 Assembly Language Programming with Ubuntu.
This book is free, recent and had the perfect scope for me: it's aimed at people that already have a good grasp of programming, but are new to x86-64 assembly, and it shows some theory and concepts, but there are plenty of exercises to learn from practice.
It was pretty fun to work through that book, and it worked well for me to create some familiarity with x86-64 assembly. I'm sure there are still a bunch of things to learn on the subject, since the book only gives a basis, but it was enough to teach me some interesting things.
Signedness and two's complement
The biggest lesson to me was a better understanding of signedness. I'm used to
unsigned int in C, and to watch out for using the wrong
signedness, but it wasn't as clear to me how that worked at the assembly level.
The first thing to have in mind, is that the type concept present in higher
level languages like C (like if a number is signed or not) is completely absent
in assembly. The computer memory stores only 0s and 1s, and it's up to you, the
programmer, to interpret what they mean: is
01011000 the number 88, the
POP AX instruction? With only that single byte, you
can't even be sure of the size: maybe those are really 8 boolean flags in a
single byte, or part of a 4-byte signed number. Without context it's impossible
If the same representation can mean both a signed or unsigned number, depending on the context, that means that when operating on those numbers, you as the programmer have to use the right variant of the instruction to give that context to the computer.
While going through the book, the following arithmetic instructions were presented for unsigned numbers:
addadds two numbers
subsubtracts two numbers
mulmultiplies two numbers
divdivides two numbers
And the following instructions were shown for comparison between unsigned numbers:
jacompares two numbers and jumps if the first one is above the second
jbcompares two numbers and jumps if the first one is below the second
And sure enough, shortly after, the signed variants of those instructions were also shown:
mul's signed variant
div's signed variant
ja's signed variant
jb's signed variant
But wait, what about
isub? That's the thing, the way x86-64
represents negative numbers is through the use of the two's complement
system, which has the useful property of allowing addition and subtraction to be
done in the exactly same manner for both signed and unsigned values.
This means that there's only one way to add, independently of the signedness,
and it's using
add. There's no
iadd. Likewise for subtraction.
So the interesting conclusion is that for addition and subtraction it doesn't
matter if you use
unsigned int or
int for the variables in C. The
unsigned keyword is there for you to tell the compiler to use the right
variant of the instruction in the generated assembly, which is required when
you're comparing numbers (
imul) or dividing (
idiv). But thanks to two's
complement, in addition and subtraction there's no way to get it wrong 🙂.
Side note: interestingly, while writing this post, I read on the Wikipedia page
that two's complement also works the same for multiplication, but only if you do
a sign extend of the two operands beforehand. Which makes me think that if the
mul instruction always did the sign extend step, no
would be required as well, but that would probably increase complexity (and
cost) in the logic circuitry.
Other interesting lessons
The other thing that interested me the most was to realize that local variables
are nothing more than adding more space to the stack. And that this is done
simply by subtracting the stack register
rsp by the total number of bytes
needed for the variables at the start of a subroutine.
Also interesting was to learn how there are calling conventions to standardize on:
- which registers are used to pass arguments to subroutines and in which order;
- which registers can be overwritten by a subroutine and which should be left unchanged. When using the latter, its current value should first be pushed on the stack so that it can be restored before returning.
And what about the magic
main() function that the C compiler expects in
every C program? Assembly doesn't need compiling, so no need for that, but turns
out a different magic label is expected by the linker:
Some other things that were interesting to do in assembly:
- Making syscalls
- Exploiting a stack buffer overflow
- Interacting assembly code with C code, and vice versa.
Lack of a good GUI
One thing I missed was a good GUI application when debugging the assembly programs. It would have been really helpful to have one that showed the values of expressions in tooltips when hovering, that was able to follow labels when clicking, and so on.
The book recommends using DDD, which is a GUI, but it felt clunky and really outdated. I went for using GDB together with the peda plugin, and that worked reasonably well, but being a CLI, every inspection required divining the correct command, so it took more time to get oriented.
This was a great experience and I hope to get back to it and further my knowledge past the "basic" level for x86-64 sometime in the future. Seeing what's happening at the assembly level really helps better understand the higher level languages, and value the way they hide complexities below!
I've uploaded the code I wrote for all the book's exercises to this repository. I don't expect it to be useful to anyone since it's simple stuff, but it's there either way.
The only exercise that I couldn't actually finish was the last one. There's very little information on the book about how to do it, and during research of the topic online I eventually got demotivated and started learning about other subjects instead. Maybe one day I'll give it another try. If you do know how to do it, get in touch! 🙂
And even though I couldn't finish that last exercise, while researching about it
I ended up learning about how to use the
asm syntax for GCC through this
guide, to embed assembly in a C file, and also about the Compiler
Explorer which seems a great way to learn about assembly and C by just seeing
what assembly is generated from a given C code, so I'm calling this a win!