3
2. Background
To understand the purpose of this paper, it is crucial to understand the fundamentals of forward
and reverse engineering, Windows portable executable (PE) file format, and assembly language.
The discussion of forward and reverse engineering explains the different phases of software
during the compilation and decompilation process. The Windows portable executable file format
section explains the structure and sections of the PE file format. And finally, the assembly
language section explains the x86 assembler, assembly instruction format, and different
categories of assembly instructions.
2.1 Fundamentals of forward and reverse engineering
In forward engineering, source code passes through four phases: compiling, assembling, linking,
and execution [32]. Reverse engineering deals with these four phases in reverse order –
execution, linking, disassembling, and then decompiling. A considerable amount of information
is lost in the transition through these phases, and is unrecoverable in the reverse transition [1].
Reverse engineering consists of many practices, such as reverse assembling from native machine
code (disassembling), reverse compiling from assembly code (decompiling), reverse
programming from the source code itself (debugging), reverse programming of legacy code, and
software reusability [3]. This paper deals with the implementation of reverse compiling from
assembly code, a process that faces the most difficulties compared to other techniques. Most
compilers generate the assembly code from the source code, which is then parsed by the
assembler to generate the object code [32]. These two phases are replaced by the interpreter in
scripting languages like Perl, PHP, and JavaScript, as they generate the object code directly from
the source code [23]. Virtual compilers (for example, Java compiler) produce byte code, which is
equivalent to the object code.
Figure 1 shows the relationships between the various tools used with forward and reverse
engineering [1]. The extreme right indicates the tools that take the software from one phase to
another. The nodes indicate the language or level of the code at that particular phase. The arrows
from top to bottom relate to the software life-cycle in forward engineering, and from bottom to
top in reverse engineering. This paper deals with the phase indicated by the arrow in purple.