Analysing Unknown Binaries
The "right" method to analyze an unknown binary would be to perform
reverse engineering (RE) on the code. The process of RE is however time
consuming and convoluting. As such, it may not be "cost-effective"
to RE every binary that need to be analyzed. We performed RE on
selected code (e.g. login from
bigwar.tgz) and code fragments (libproc.so.2.0.6
from bigwar.tgz). For most of the other binaries, we analyzed them using
various methods, which include
- ELF file format analysis,
- strings analysis, and
- runtime analysis.
We explain each of these techniques in the following sections.
ELF File Format Analysis
The main purpose of ELF file format analysis is to detect the presence
of parasite code which is often a virus. To parse the ELF file,
the utility readelf (part of GNU binutils
package) is used. readelf is used to display information about an ELF
object file, and the information that can be displayed includes the
file header, program header, section header, symbol tables, dynamic
information, relocation information, notes, and version information. One
of the methods that parasite code uses to reside in an ELF executable is
through the use of the techniques known as segment padding. For
segment padding, the code segment is padded by another page size, and
the parasite code resides in this padded region. To have the parasite
code executed, the entry point address (part of file header
information) is modified to point to the padded region. However, for
most of the compilers, the entry point of an executable is often the
start of the .text section (which is often near the start of the code
segment). Thus, by inspecting the program entry point address with
respect to the code segment and .text section, parasite code injected
using the segment padding technique can be discovered. For the binaries
provided, we were able to detect two different types of parasite code,
the Linux.OSF.8759
virus and the Linux/Rst-A
virus. These viruses infected chattr
(and a list of other binaries) and tools/sniffer/kde
from hax.tgz respectively. Anti-virus softwares are also helpful in
identifying the signature of virus.
Another use of ELF file format analysis is to study the dependencies
between executable and shared libraries. This information is captured in
the dynamic section of the executable. The trojanised function of an
executable may not necessary reside on the executable, it can reside in
the shared library that it is linked to instead. The hack_procinit
function of libproc.so.2.0.6
is an example. The libproc.so.2.0.6 shared library is linked by both ps and top.
Strings Analysis
Strings analysis refers to using the command strings to extract
sequences of printable characters from a file, and subsequently extracts
"interesting" strings from the command output. This method, though
primitive, is often sufficient to detect a large class of trojans in
the wild. For strings analysis, by knowing how some of the trojans
operate, we often look for configuration filenames. For example, Linux
Rootkit 5 (lrk5) and its derivatives make use of configuration files to
store lists of filenames, directory names, processes names, IP
addresses and port numbers to hide from command output. For example the
filename /usr/lib/locale/ro_RO/uboot/etc/procrc found in curatare/.Clean/pstree (in
bigwar.tgz) certainly aroused our suspicion. The filename /dev/mounnt
and the string cocacola found in the trojanised login is equally suspicious.
However, strings analysis can be easily circumvented. One of the
methods is to use filenames that do not arouse suspicion of the
investigator. The filename /usr/include/file.h in the trojanised ls is an example. For this case, we were only
brought to the attention of the series of filenames
/usr/include/{hosts.h, proc.h, file.h} in the various trojanised
binaries when we look into the shell script remove.
In the shell script remove, the files .c, .d and .p,
which resemble typical trojans configuration files are mv to
/usr/include/{hosts.h, proc.h, file.h} respectively. Another method used
to circumvent strings analysis is to encode the filename used. In
this manner, the suspicious strings will be translated to another
sequence of hexadecimal values, and may not necessarily appear in the
strings command output. Even if they do, they are not likely to
arouse suspicion. Binaries such as libproc.so.2.0.6 and dir, for example, employ such a technique.
In any case, the trojanised nature of these unknown binaries will still
surface once runtime analysis is performed.
When runtime analysis is not available (perhaps due to the lack of a
"controlled" environment), output of strings command can provide
pointers to the main function of an unknown binary too. This is possible
through help/usage/error messages that are embedded in the binary. The
binary nscd, unlike what the name suggests,
is a trojanised sshd. However, the author could easily circumvent such
analysis with false or encoded help/usage/error messages.
Yet another use of strings analysis is to know more about the system on
which the binary is compiled. This information shows up in the form of
compiler strings, such as " GCC: (GNU) egcs-2.91.66 19990314/Linux
(egcs-1.1.2 release)", which was found in numerous binaries.
Binary such as lsof embeds more
than compiler information in the binary. Information such as user
name, system name, system time, and compiler flags were also included.
When a binary is not stripped, the name of variables, source files or
header files will also show up in the binary. The strings "
/xL/lrk5/fileutils-3.13/src/" and " ../../rootkit.h", for example,
showed up in du,and ls. However, binaries that are not stripped
should not be left behind by any respectable attacker.
Runtime Analysis
Runtime analysis should only be performed in a controlled environment,
i.e. where damage done by the malicious code or worm spreading could be
contained. We used Redhat 6.2 running within VMware for this purpose. The VMware
is configured to use undoable disk to so that any damages done by the
malicious code could be undone easily. In addition, the networking
mode is setup as host-only networking, and the precaution of turning off
other network interfaces of the host machine is taken to prevent the
spread of worm, if any. Our choice of Redhat 6.2 as the guest OS
is out of convenience. Ideally, the guest OS should resemble the
honeypot, probably a Redhat 7.2 system, as close as possible.
For executables that did not yield any useful results with strings
analysis, we used strace to study the behaviour of these executables.
strace intercepts and records the system calls which are called by a
process and the signals which are received by a process. The name of
each system call, its arguments and its return value are printed on
standard error or to the file specified with the -o option. By comparing
the system calls made between a known executable and the trojanised
executable of the same kind, e.g. a legitimate login program and a
supposedly trojanised login, any anomalies can be easily detected. This,
of course, was done on the guest OS. With the use of strace, we were
able to discover configuration files that are referenced by trojanised
programs that did not shown up with static strings analysis. The program md5sum and netstat
are examples. However, an attacker can circumvent the
investigator's attempt to strace a program by using anti-debugging
techniques. In fact, this method is
employed by the parasites code of various binaries found in hax.tgz and hax-small.tgz.
Whenever we have come up with a hypothesis of how the trojan operates,
we test the hypothesis by actually running the trojanised executable
within the guest OS. The required configuration files are set up in the
respective location, and the command output is observed. We are aware
that doing so does not expose all functionality of the trojans (only
proper RE can do so). The main purpose, instead, is to verify our
hypothesis. Thus, there may probably be lots of other backdoor or
trojanised behaviour that remain undiscovered by our analysis.