Recently I decided to learn assembly. I already had a reasonable understanding of how it worked due to some classes that touched the subject in university, however I never had the opportunity to really write assembly code.
Since my everyday computer is an x86-64 machine, it made most sense to learn assembly for this architecture, so I could avoid the need for a VM. I started with only the desire to get my hands dirty with assembly code, and not any particular objective or project.
At first I was alternating between trying things out and researching on the web just to understand enough to get a bare minimum assembly file and commands that would assemble it and run. Eventually I stumbled upon the book that would guide me: x86-64 Assembly Language Programming with Ubuntu.
This book is free, recent and had the perfect scope for me: it's aimed at people that already have a good grasp of programming, but are new to x86-64 assembly, and it shows some theory and concepts, but there are plenty of exercises to learn from practice.
It was pretty fun to work through that book, and it worked well for me to create some familiarity with x86-64 assembly. I'm sure there are still a bunch of things to learn on the subject, since the book only gives a basis, but it was enough to teach me some interesting things.
The biggest lesson to me was a better understanding of signedness. I'm used to
seeing int
and unsigned int
in C, and to watch out for using the wrong
signedness, but it wasn't as clear to me how that worked at the assembly level.
The first thing to have in mind, is that the type concept present in higher
level languages like C (like if a number is signed or not) is completely absent
in assembly. The computer memory stores only 0s and 1s, and it's up to you, the
programmer, to interpret what they mean: is 01011000
the number 88, the
character X
, the POP AX
instruction? With only that single byte, you
can't even be sure of the size: maybe those are really 8 boolean flags in a
single byte, or part of a 4-byte signed number. Without context it's impossible
to tell.
If the same representation can mean both a signed or unsigned number, depending on the context, that means that when operating on those numbers, you as the programmer have to use the right variant of the instruction to give that context to the computer.
While going through the book, the following arithmetic instructions were presented for unsigned numbers:
add
adds two numberssub
subtracts two numbersmul
multiplies two numbersdiv
divides two numbersAnd the following instructions were shown for comparison between unsigned numbers:
ja
compares two numbers and jumps if the first one is above the secondjb
compares two numbers and jumps if the first one is below the secondAnd sure enough, shortly after, the signed variants of those instructions were also shown:
imul
is mul
's signed variantidiv
is div
's signed variantjg
is ja
's signed variantjl
is jb
's signed variantBut wait, what about iadd
and isub
? That's the thing, the way x86-64
represents negative numbers is through the use of the two's complement
system, which has the useful property of allowing addition and subtraction to be
done in the exactly same manner for both signed and unsigned values.
This means that there's only one way to add, independently of the signedness,
and it's using add
. There's no iadd
. Likewise for subtraction.
So the interesting conclusion is that for addition and subtraction it doesn't
matter if you use unsigned int
or int
for the variables in C. The
unsigned
keyword is there for you to tell the compiler to use the right
variant of the instruction in the generated assembly, which is required when
you're comparing numbers (ja
vs jg
, jb
vs jl
), multiplying
(mul
vs imul
) or dividing (div
vs idiv
). But thanks to two's
complement, in addition and subtraction there's no way to get it wrong 🙂.
Side note: interestingly, while writing this post, I read on the Wikipedia page
that two's complement also works the same for multiplication, but only if you do
a sign extend of the two operands beforehand. Which makes me think that if the
mul
instruction always did the sign extend step, no imul
instruction
would be required as well, but that would probably increase complexity (and
cost) in the logic circuitry.
The other thing that interested me the most was to realize that local variables
are nothing more than adding more space to the stack. And that this is done
simply by subtracting the stack register rsp
by the total number of bytes
needed for the variables at the start of a subroutine.
Also interesting was to learn how there are calling conventions to standardize on:
And what about the magic main()
function that the C compiler expects in
every C program? Assembly doesn't need compiling, so no need for that, but turns
out a different magic label is expected by the linker: _start
.
Some other things that were interesting to do in assembly:
One thing I missed was a good GUI application when debugging the assembly programs. It would have been really helpful to have one that showed the values of expressions in tooltips when hovering, that was able to follow labels when clicking, and so on.
The book recommends using DDD, which is a GUI, but it felt clunky and really outdated. I went for using GDB together with the peda plugin, and that worked reasonably well, but being a CLI, every inspection required divining the correct command, so it took more time to get oriented.
This was a great experience and I hope to get back to it and further my knowledge past the "basic" level for x86-64 sometime in the future. Seeing what's happening at the assembly level really helps better understand the higher level languages, and value the way they hide complexities below!
I've uploaded the code I wrote for all the book's exercises to this repository. I don't expect it to be useful to anyone since it's simple stuff, but it's there either way.
The only exercise that I couldn't actually finish was the last one. There's very little information on the book about how to do it, and during research of the topic online I eventually got demotivated and started learning about other subjects instead. Maybe one day I'll give it another try. If you do know how to do it, get in touch! 🙂
And even though I couldn't finish that last exercise, while researching about it
I ended up learning about how to use the asm
syntax for GCC through this
guide, to embed assembly in a C file, and also about the Compiler
Explorer which seems a great way to learn about assembly and C by just seeing
what assembly is generated from a given C code, so I'm calling this a win!