Learning ARM assembly seems daunting at first. The documentation I used for this is here. Even when focusing on Chapter 13 (A32 and T32 Instructions), the number of mnemonics remains pretty large.
So I adopted a pragmatic approach and studied what mnemonics of A32 and T32 instructions appear in my Raspberry Pi’s libc, namely:
pi@raspwn:~$ /lib/arm-linux-gnueabihf/libc.so.6
GNU C Library (Debian GLIBC 2.19-18+deb8u10) stable release version 2.19, by Roland McGrath et al.
<snip>
Compiled by GNU CC version 4.8.4.
Compiled on a Linux 3.16.7 system on 2017-06-19.
<snip>
I thus focused on instructions of the aforementioned Chapter 13, putting
aside all advanced SIMD instructions (mostly v*
instructions) of
chapter 14.
I extracted the information summed up in the table below, where
NOO means Number Of Occurences and M means Mnemonic (mnemonics can be
prepended with a
condition code,
an optional suffix such as s, etc).
“Mnemonics” with a * stand for multiple mnemonics, namely:
- b*: b, bl, blx, blxns, bx, bxns, bxj
- ldr*: ldr, ldrd
- cm*: cmp, cmn
- str*: str, strb, strh, strd, strt, strbt, strht
- mcr*: mcr, mcr2
- strex*: strex, strexb, strexh, strexd
- ldrex*: ldrex, ldrexb, ldrexh, ldrexd
- uxt*: uxtab, uxtab16, uxtah, uxtb, uxtb16, uxth
- rev*: rev, rev16, revsh
- pl*: pld, pldw, pli
- sxt*: sxtab, sxtab16, sxtah, sxtb, sxtb16, sxth
- uqsub*: uqsub8, uqsub16
- stc*: stc, stc2
- cdp*: cdp, cdp2
- mrc*: mrc, mrc2
- mcrr*: mcrr, mcrr2
- ldc*: ldc, ldc2
NOO | M | NOO | M | NOO | M | NOO | M |
---|---|---|---|---|---|---|---|
51449 | b* | 1402 | lsl | 392 | asr | 31 | rsc |
48731 | ldr* | 1392 | orr | 319 | sbc | 26 | uqsub* |
37910 | mov | 1010 | bic | 316 | adc | 17 | ror |
25452 | cm* | 938 | svc | 255 | nop | 16 | stc* |
23701 | str* | 900 | lsr | 144 | rev* | 11 | cdp* |
22424 | add | 768 | mul | 122 | mla | 6 | mrc* |
8137 | and | 706 | mcr* | 89 | smull | 5 | umlal |
7822 | sub | 678 | strex* | 86 | umull | 4 | movt |
3958 | pop | 678 | ldrex* | 60 | pl* | 3 | mrs |
2653 | rsb | 534 | stm | 57 | clz | 2 | msr |
2463 | push | 479 | eor | 51 | sxt* | 2 | mcrr* |
1923 | mvn | 426 | ldm | 39 | teq | 1 | smlal |
1460 | tst | 403 | uxt* | 38 | smla | 1 | ldc* |
The total number of occurences from that table is 250490 for 52 groups of
mnemonics. Learning the 11 most used ones (from b* to push) should allow to
read 94% of the (non-v*
) glibc disassembly. All of a sudden, learning ARM
assembly does not look herculean anymore 😄
One should however be aware that this study has been done on a single libc,
compiled with one gcc version (that in particular does not produce thumb
instructions). So there is no claim of universality whatsoever.
Note: it is pretty comforting, for a load/store architecture such as ARM (to be taken with caution, see this), to have ldr and str appearing in the top 5 of most used mnemonics in the glibc!