ARM assembly - most occuring mnemonics
Jul 23, 2017
3 minute read

Learning ARM assembly seems daunting at first. The documentation I used for this is here. Even when focusing on Chapter 13 (A32 and T32 Instructions), the number of mnemonics remains pretty large.

So I adopted a pragmatic approach and studied what mnemonics of A32 and T32 instructions appear in my Raspberry Pi’s libc, namely:

pi@raspwn:~$ /lib/arm-linux-gnueabihf/
GNU C Library (Debian GLIBC 2.19-18+deb8u10) stable release version 2.19, by Roland McGrath et al.
Compiled by GNU CC version 4.8.4.
Compiled on a Linux 3.16.7 system on 2017-06-19.

I thus focused on instructions of the aforementioned Chapter 13, putting aside all advanced SIMD instructions (mostly v* instructions) of chapter 14. I extracted the information summed up in the table below, where NOO means Number Of Occurences and M means Mnemonic (mnemonics can be prepended with a condition code, an optional suffix such as s, etc). “Mnemonics” with a * stand for multiple mnemonics, namely:

  • b*: b, bl, blx, blxns, bx, bxns, bxj
  • ldr*: ldr, ldrd
  • cm*: cmp, cmn
  • str*: str, strb, strh, strd, strt, strbt, strht
  • mcr*: mcr, mcr2
  • strex*: strex, strexb, strexh, strexd
  • ldrex*: ldrex, ldrexb, ldrexh, ldrexd
  • uxt*: uxtab, uxtab16, uxtah, uxtb, uxtb16, uxth
  • rev*: rev, rev16, revsh
  • pl*: pld, pldw, pli
  • sxt*: sxtab, sxtab16, sxtah, sxtb, sxtb16, sxth
  • uqsub*: uqsub8, uqsub16
  • stc*: stc, stc2
  • cdp*: cdp, cdp2
  • mrc*: mrc, mrc2
  • mcrr*: mcrr, mcrr2
  • ldc*: ldc, ldc2
51449 b* 1402 lsl 392 asr 31 rsc
48731 ldr* 1392 orr 319 sbc 26 uqsub*
37910 mov 1010 bic 316 adc 17 ror
25452 cm* 938 svc 255 nop 16 stc*
23701 str* 900 lsr 144 rev* 11 cdp*
22424 add 768 mul 122 mla 6 mrc*
8137 and 706 mcr* 89 smull 5 umlal
7822 sub 678 strex* 86 umull 4 movt
3958 pop 678 ldrex* 60 pl* 3 mrs
2653 rsb 534 stm 57 clz 2 msr
2463 push 479 eor 51 sxt* 2 mcrr*
1923 mvn 426 ldm 39 teq 1 smlal
1460 tst 403 uxt* 38 smla 1 ldc*

The total number of occurences from that table is 250490 for 52 groups of mnemonics. Learning the 11 most used ones (from b* to push) should allow to read 94% of the (non-v*) glibc disassembly. All of a sudden, learning ARM assembly does not look herculean anymore 😄 One should however be aware that this study has been done on a single libc, compiled with one gcc version (that in particular does not produce thumb instructions). So there is no claim of universality whatsoever.

Note: it is pretty comforting, for a load/store architecture such as ARM (to be taken with caution, see this), to have ldr and str appearing in the top 5 of most used mnemonics in the glibc!