AUTOMATED REVERSE ENGINEERING: MISTFALL ENGINE ---------------------------------------------- (x) 2000 Z0MBiE xlated from russian in 2001 Our efforts are directed to develop such method of executable program modification, that finding changes will require maximal amount of time. Modification means addition of the viral code to some specified program, given in the PE format. It is obvious, that main viral body should be encrypted, and metamorphic (generated) virus decryptor should be integrated with program's code. Hence, our task is divided into 3 parts, or 3 global questions: WHAT?, WHERE? and HOW? WHAT? - is a question about instructions, those will be inserted into modifying program. It is explained in the article about metamorphism and shown in the CODEGEN metamorphic engine. WHERE? - is a question of finding places in the modifying program, our instructions should be inserted into. This is a simple task; engine's caller will resolve it in the different ways. This article shows HOW to insert our own instruction between two arbitrary instructions of the modifying program. In other words, how to decompile, modify and compile the whole program. THEORY So, task is to insert own instructions between PE file's ones. But, because of instructions are linked between each other, changing instructions involves change of links. But change of links involves change of other instructions, and so on, until significant part of code will be changed. For example: when inserting some instruction into block of code, offsets of all the instructions coming after insertion are increased. This means that mostly all offsets - both relative and absolute, should be fixed. While fixing relative offsets, some jxx can grow, that means that all this shit should be repeated from start. As you can see, there are two kinds of offsets: absolute (offsets itself) and relative (jmp/call arguments). And we should find all these offsets. Absolute offsets may be found by means of analyzing PE structure, including fixup table. Finding relative offsets requires partial disassembly. Hence, we must disassemble file into some easy-modifiable substance, change this substance, and assemble file back. Algorithm will be formalized as an universal engine, working with PE files. So, task is divided into 5 general steps: 1. Load PE file into virtual addresses; allocate flag table (DWORD for each byte), and initialize it with special PE structure - related bits; in other words, perform initial PE structure analysis. 2. Disassemble file (instruction by instruction), at the same time filling flag table with new information about instruction offsets. At this step we can not make a mistake, i.e. to recognize code as data or vise versa, because such fault is fatal; so, dual situations should be excluded, and some files may not be processed at all. 3. Convert all known information into list of: instructions, datablocks, labels and pointers. I.e. to make our information a bit more high-level, a bit nearer to source. Such list is an easy-modifiable substance we was speaking about; it is generated because of easy manipulation with its elements. 4. Call (external relative to engine), which will modify instruction list; for example insert additional instructions or remove existing ones. 5. Assemble file back: recalculate all offsets; fix PE structure; rebuild fixup table and so on. DISASSEMBLING PROBLEMS Main problem, of course, is disassembling. We have no such resources, as IDA, for example, - neither memory, neither time, nor signatures; moreover, our virus is limited to some kilobytes. And main disassembling problem is duality of some labels: they can precede both code and data. I.e. if we have some data fixup that points to some label, and there is no code instruction pointing to the same label, then we can not correctly decide if this label precedes code or data. Such misrecognizing causes the following: code, interpreted as data, will not be fixed, and will fail program when executed, commonly on such instruction as jmp, jxx or call. Vise versa, incorrectly fixed data (i.e. data interpreted as code) will with high probability cause fault too. This means that we must always correctly disitinguish code and data. Some library functions may be found by singatures, but we can not insert signature base into virus; also, jmptable-analysis can help, but only partially. We can work with only special easy-decompileable files, containing only code in the CODE section, but such files are real rarity. So, we will a bit reduce set of processing files (exclude packed ones and etc.), improve our disassembler as possible, and hope for luck. IMPLEMENTATION Really, there are much more little steps we should perform; and each step have an influence to the result. Here is a complete list of steps required for PE file reversing: - check if given PE file is valid; if fixup table is present & etc. - allocate memory for virtual program image and for flag table - load into virtual addresses: 1. dos stub & pe header 2. sections - analyze PE header; process pointers; mark phsyical/virtual start/end of each section & etc. - process imports - process exports - process fixups - process resources - search for signatures (such as push ebp/mov ebp,esp) and mark 'em as for-next-analysis - mark entry point - disassemble file, instruction by instruction. algorithm is described in the article about permutation, but there are some differences: when there are no more bytes marked as for-next-analysis, we must search for labels-pointed-by-data-fixups, and check if they precedes code instructions. (see below) - build list of opcodes, labels, pointers & etc. note, that such list will contain zero-length pointers (labels), i.e. transfer is performed to higher level of abstraction, nearer to source. After converting linear arrays into list, linear arrays (memory & flag table) may be deallocated. - modify list; while debugging, i was inserting NOPs between instructions. - recalculate new virtual/physical addresses for each list entry - recalculate fixup table - recalculate pointers (i.e. rva's, fixups & relative aguments of the conditional jmps); if some jmps are grown, repeat from recalculating virtual addresses. Note, that here possible some iterations, it is normal. - assemble list (collect all stuff into single file) - write file to disk - copy overlay, if present - recalculate checksum, if non-zero Phrases, such as "analyzing PE header" or "process imports" means that we analyzing these structures and marking labels, pointers and other special objects in the flag table with special bits. For example byte at the pe_header+28h (28h=EntryPointRva) will be marked as FLAG_DWORD | FLAG_RVA, and byte it points to will be marked as FLAG_LABEL | FLAG_CREF. Which special objects will be present in the instruction list? 1. Label, i.e. zero-length substance pointed by RVAs and FIXUPs. 2. RVA, or DWORD pointing to some label. 3. FIXUP, or DWORD, same as RVA, but increased by IMAGEBASE, requires to be inserted into fixup table on the final steps. 4. so-called DELTA, or difference between virtual addresses of two labels. 5. instruction 6. datablock Now, about distinguishing code and data when only existing reference is data-reference. Algorithm: take instruction by instruction; check if it is not 00 00, FF FF, F4 (hlt), CD (int), and so on -- i.e. fail if such instruction is never present in the standard PE file. Also fail, if one of the middle bytes of the instruction is marked as label, data, or other special object. Also, if instruction is jmp,call,jxx,jecxz and so on, then it should not point into the middle of other instruction or to data. Analysis is repeated, until marked-as-code instruction, RET or JMP is found. But, there are another kind of dual situation. You can think that such object as label can not exists in the middle of some instruction, generated by the HLL compiler. But it can. Lets examine typical situation: AVPBASE.DLL: 100050D3 83E904 sub ecx, 4 100050D6 720C jb 100050E4 100050D8 83E003 and eax, 3 100050DB 03C8 add ecx,eax 100050DD FF2485F0500010 jmp dword ptr [100050F0+eax*4] (1) 100050E4 FF248DE8510010 jmp dword ptr [100051E8+ecx*4] 100050EB 90 nop 100050EC FF248D6C510010 jmp dword ptr [1000516C+ecx*4] (2) 100050F3 90 nop 100050F4 00510010 dd 10005100 100050F8 2C510010 dd 1000512C As you can see, 100050F0-address, that is XREF-ed by (1), really is part of the instruction (2). Its not so hard to understand why it is so. As a result, we can not decide, if (2) is code or data. I.e. it is impossible to determine if (10050F0 == 100050F4 - 4) or (10050F0 == 100050EC + 4). So, pointer at (1) can not be fixed and file can not be processed. Or, another one example: FAR.EXE: 004474D8 B8E1C24200 mov eax,0042C2E1 ; ==42C350-6Fh ; (1) 004474DD 6A00 push 00 004474DF 6800000100 push 00010000 004474E4 83C06F add eax, 6F ; ==111 004474E7 50 push eax 004474E8 E8AF180100 call 00458D9C ... 0042C2DB E8B0450200 call 00450890 0042C2E0 83C408 add esp, 08 ; (2) 0042C2E3 8D9500FFFFFF lea edx,[ebp][0FFFFFF00] ... 0042C350 55 push ebp 0042C351 8BEC mov ebp, esp 0042C353 833D5054460003 cmp d,[000465450], 03 And, in addition, another bad thing: parts of 16-bit code, inserted into 32-bit applications, such as antiviruses, formatters and so on. The ENGINE Though, it works. Engine is called MISTFALL, there is G.Martin's story named "With morning comes Mistfall". Engine is written in Borland C++, but without classes or other shit. Main engine() subroutine with fucking lots of parameters comes first, followed by some internal additional subroutines. All this stuff is called kernel, and it is located in the engine.cpp & .hpp; code and constants correspondingly. Kernel calls so-called user's mutator (mutate.cpp), which must be written by engine's user. Mutator works only with instruction list. List entries describes labels, opcodes, data blocks and so on. Moreover, all the engine interface technology is taken from RPME engine. The only difference is that RPME works with blocks of viral code, and MISTFALL works with PE files. As a result, simplest file infection is the following: 1. Find any two non-jmp instructions; insert JMP after first instruction to the second one; insert encrypted viral body after JMP. 2. The same for decryptor. 3. The same for commands calling the decryptor. This was only one of lots of possible variants; the more such methods we will use, the harder it will be to analyze the virus. USAGE Engine uses lots of memory. It can fail on incorrect file; and on normal one too. Memory allocation, same as in RPME, is provided by engine's caller. Before calling engine, you must allocate about 32 MB of memory, and write own malloc() subroutine which will provide to engine pointers to little parts of this big block. It is just kind of own heap. Engine's allocation strategy is specific: it will only allocate memory and never deallocate. But, its ok, 'coz the same big memory block will be used for each file. Really, engine uses (17*SizeOfImage) bytes of memory for linear arrays and 40 bytes of memory for each list entry. Engine's code is permutable, i.e. it satisfies most of demands described in the "Demands to engines" article. Engine can work in ring-0, but, because of lots of required memory & time, dont call it there. Time, used by engine is unpredictable; minimal time for processing one file is some minutes in realtime priority. Obvious, that such engine cant be invoked on system events, such as openfile. You should work with file queue, passing file by file to engine in the individual thread or process. SPECIAL FEATURES Engine works only with standard files. I.e. all the section names must be known: such as .text, .data and so on. Otherwise file is packed or whatever, and will not be processed. For all the standard files it is accepted that code exists only in the first section; this allows us to increase disassembly quality. SIGNATURE LIBRARY This is another external subroutine, performing 2 tasks: 1. on start: search for known codeblocks, and 2. add existing codeblocks into library I.e. before disassembling signature manager will mark known codeblocks as CODE, for-next-analysis, and, on exit, it will update signature library with new information. When applied to viruses, signature library will be created per each local machine, performing kind of self-education, with each new file. Signature here means some constant code bytes, not containing fixups and relative offsets. ENGINE INTERFACE // return: 0==success, non-zero error codes listed in ENGINE.HPP int __cdecl engine( DWORD user_arg, // user's parameter (whatever) BYTE* buf, // PE file DWORD inbufsize, // initial size of file DWORD* outbufsize, // ptr to resulting size of file DWORD maxbufsize, // maximal allocated buf size int __cdecl user_disasm(DWORD,BYTE*), // length-disassembler void* __cdecl user_malloc(DWORD,DWORD), // memory allocator DWORD __cdecl user_random(DWORD,DWORD), // randomer int __cdecl user_mutate( // mutator DWORD user_arg, // (described below) PE_HEADER* pe, PE_OBJENTRY* oe, hooy* root, int __cdecl (*)(DWORD user_arg,BYTE*), void* __cdecl (*)(DWORD user_arg,DWORD), DWORD __cdecl (*)(DWORD user_arg,DWORD) ), int __cdecl user_sigman( // signature manager or NULL DWORD user_arg, PE_HEADER* pe, PE_OBJENTRY* oe, BYTE* memb, DWORD* flag, DWORD action) // 0/1 ); //engine int __cdecl mutate( //mutator DWORD user_arg, // user's parameter PE_HEADER* pe, // pe header PE_OBJENTRY* oe, // pe objtable hooy* root, // list root int __cdecl user_disasm(DWORD user_arg,BYTE*), void* __cdecl user_malloc(DWORD user_arg,DWORD), DWORD __cdecl user_random(DWORD user_arg,DWORD) ); //mutate int __cdecl sigman( //signature manager DWORD user_arg, // user parameter PE_HEADER* pe, // pe header PE_OBJENTRY* oe, // pe objtable BYTE* memb, // virtual memory block to work with DWORD* flag, // flag table DWORD action // 0==before(mark), 1==after(update) ); //sigman This shit shown above... of course it doesnt means, that you should program in C++ to use engine. Engine may be called both from asm & cpp. EXAMPLES Mistfall engine was used in the Z-10 virus, all sources are published in the TZ #1 e-zine. * * *