Here are the results of the PE_STAT program, which calculates frequences of the PE EXE/DLL opcode usage (x86 32-bit code).
Executable files were parsed into instructions using MISTFALL 1.01 engine; this means that only about half of executable files were processed on analyzing hd; and other part were filtered out because of some restrictions, to guarantee that only and only real opcodes (not data) will be processed.
Results shows us, that there are only few opcodes, that are used in most cases; while most part is used very seldom or doesnt used at all.
Set of these unused opcodes can be now used to increase quality of parsing executable files into instructions, i.e. to distinguish between code and data.
These "unused" opcodes here has low frequency values, 4ex 1-1000. Non-zero frequences can be explained by imperfect disassembly.
NOTE: Frequences of the first opcodes in the following table in some cases may be nonapplicable, because of RTL code present in mostly all analyzed files.
Total files processed: ~1700 Total opcodes processed: ~41000000 op frequency % ---mostly used as:--- 8B 6588971 15% mov modr/m FF 2736426 6% push modr/m E8 2509099 6% call 83 2240885 5% cmp/add modr/m (including add esp, xx after call) 89 2045133 4% mov modr/m 8D 1573296 3% lea modr/m 50 1423289 3% push eax 74 1269798 3% jz 6A 1064820 2% push xx 85 1001107 2% test r,r 0F 939376 2% 0F xx 56 882376 2% push esi 75 845429 2% jnz 33 781974 1% xor r,r 53 740703 1% push ebx 66 738157 1% operand-size modifier prefix (-->16-bit) EB 734922 1% jmp xx 68 705038 1% push imm32 57 679402 1% push edi C7 639613 1% mov modr/m, imm E9 616969 1% jmp C3 518251 1% retn 5E 515151 1% pop esi 3B 503023 1% cmp r,r 55 467792 1% push ebp 51 465043 1% push ecx 59 454977 1% pop ecx (after call) C2 423134 1% retn n 5B 388365 0% pop ebx 5F 378583 0% pop edi B8 361314 0% mov eax, c 5D 357410 0% pop ebp 52 303136 0% push edx 81 242215 0% 03 241530 0% 8A 219404 0% 39 214276 0% 64 208496 0% 80 201614 0% C6 201273 0% C1 190927 0% A1 177274 0% 2B 173151 0% F6 166445 0% C9 146955 0% F7 135687 0% 88 125771 0% F3 103929 0% A5 101174 0% 7C 99153 0% B9 83369 0% 84 79363 0% D9 73042 0% 72 68183 0% 40 67919 0% 7E 67620 0% A3 67196 0% 48 66434 0% 7D 66015 0% 76 62449 0% 58 59073 0% 3D 55021 0% BF 52910 0% BE 52893 0% DD 50018 0% 1B 47972 0% 73 45725 0% 01 43249 0% D1 41029 0% 23 40509 0% 7F 40459 0% BB 40403 0% BA 40217 0% AB 39515 0% 46 39420 0% 0B 39209 0% 77 34632 0% 25 34612 0% D8 33722 0% 43 33402 0% 3C 29601 0% 05 28960 0% 47 28381 0% A4 27395 0% 49 27157 0% 5A 27086 0% 99 24504 0% DB 24317 0% F2 23894 0% AE 23725 0% 41 21745 0% A8 20662 0% 42 20038 0% DC 19108 0% B0 18301 0% 3A 17726 0% ... A9 17323 0% test eax, c 4A 16252 0% dec edx 24 16162 0% and al, nn 6B 15040 0% imul modr/m, imm8 DF 14601 0% fpu 38 14428 0% cmp modr/m (8-bit) 4E 13731 0% dec esi 4F 12994 0% dec edi D3 12952 0% shift modr/m, cl 29 12266 0% sub modr/m 4B 11811 0% dec ebx DE 11689 0% fpu B2 11646 0% mov dl, nn A6 10319 0% cmpsb 69 9156 0% imul modr/m, c 32 8539 0% xor modr/m (8-bit) AA 8469 0% stosb FE 8463 0% 2D 8450 0% sub eax, c 79 8017 0% jns 0C 7954 0% or al, nn 09 7362 0% or modr/m BD 6953 0% mov ebp, c 21 6680 0% and modr/m 9E 6556 0% sahf 0A 6409 0% or modr/m (8-bit) 0D 6277 0% or eax, c 31 5936 0% xor modr/m 9B 4925 0% fwait A0 4764 0% mov al, [addr] 90 4757 0% nop 13 4490 0% adc modr/m B3 4484 0% mov bl, nn 2C 4093 0% sub al, nn 45 4083 0% inc ebp FC 3769 0% cld 78 3744 0% js xx 87 3329 0% xchg modr/m B1 3247 0% mov cl, nn A2 3034 0% mov [addr], al 67 2995 0% address-modifier prefix (-->16-bit) A7 2809 0% cmpsd 54 2754 0% push esp C0 2723 0% shift modr/m, nn 04 2649 0% add al, nn 8F 2287 0% pop modr/m 02 2268 0% add modr/m (8-bit) 4D 2177 0% * dec ebp C8 2108 0% * enter E3 1787 0% * jecxz xx 22 1762 0% and modr/m (8-bit) 08 1704 0% or modr/m (8-bit) AC 1665 0% * lodsb 20 1643 0% and modr/m (8-bit) 2A 1563 0% sub modr/m (8-bit) DA 1325 0% fpu 92 1288 0% * xchg edx, eax F0 1106 0% lock D0 1092 0% shift, 1 D2 1057 0% shift, cl 00 988 0% add modr/m CC 985 0% * int3 9C 908 0% * pushfd 9D 883 0% * popfd F8 872 0% * clc 11 857 0% * adc modr/m 1A 847 0% * sbb modr/m (8-bit) E2 730 0% * loop xx 86 707 0% xchg modr/m F9 652 0% * stc 30 615 0% * xor modr/m 7A 562 0% jp xx FD 540 0% * std 91 535 0% * xchg ecx, eax B5 512 0% * mov ch, nn 19 456 0% * sbb modr/m 34 425 0% * xor al, cc B4 393 0% * mov ah, cc 2E 391 0% * cs: 28 386 0% * sub modr/m CD 362 0% * int nn 35 281 0% * xor eax, c AF 279 0% * scasd B7 275 0% * mov bh, nn 98 273 0% * cwde D7 271 0% xlat 96 185 0% * xchg esi, eax F5 178 0% * cmc AD 176 0% * lodsd CB 168 0% * retf E6 158 0% out port, al 7B 133 0% jnp xx 44 120 0% inc esp B6 116 0% * mov dh, nn 93 110 0% * xchg ebx, eax CA 104 0% retf n 61 83 0% * popad 60 75 0% * pushad 65 72 0% * gs: 8E 72 0% mov sr, modr/m 26 71 0% * es: 1C 68 0% * sbb al, nn 97 60 0% * xchg edi, eax E4 60 0% in al,port 4C 59 0% dec esp 5C 56 0% pop esp 8C 50 0% * mov r,sr EC 48 0% in al,dx EF 48 0% out dx, eax FA 45 0% cli 1E 43 0% * push ds EE 41 0% out dx,al BC 40 0% mov esp, c 10 39 0% adc modr/m,r8 70 35 0% jo xx C4 35 0% les C5 34 0% lds E0 32 0% * loopne xx ED 32 0% in eax,dx 14 31 0% * adc al, nn CE 29 0% into 18 28 0% sbb modr/m,r8 36 26 0% ss: 63 25 0% arpl 6E 22 0% outsb 94 20 0% xchg esp, eax 9F 20 0% lahf 9A 19 0% * call seg:offs E1 19 0% * loope xx 15 18 0% adc eax, c D4 17 0% * aam nn FB 17 0% sti 95 16 0% * xchg ebp, eax 1F 14 0% pop ds 82 13 0% * cmd byte modr/m, imm8 0E 12 0% push cs 62 12 0% bound 71 11 0% jno D6 10 0% * setalc 12 9 0% * adc modr/m 3E 9 0% ds: 6F 8 0% outsd CF 8 0% * iretd D5 8 0% aad nn F4 8 0% hlt 06 7 0% * push es 37 6 0% aaa E5 6 0% in eax, port E7 5 0% out port, eax EA 5 0% * jmp seg:offs F1 5 0% break 6C 4 0% insb 6D 4 0% insd 1D 3 0% * sbb eax, c 27 3 0% daa 2F 3 0% das 16 2 0% * push ss 17 2 0% * pop ss 07 1 0% * pop es 3F 1 0% aas
With (*)-mark here are shown opcodes, that are sometimes used in viruses, but, as you can see, doesnt used in executables enough frequent.