You cannot have a science without measurement. | |
R. W. Hamming |
However, this has drastic effects:
Since we directly copy binary code, the virus is restricted to a particular hardware architecture.
Code cannot use shared libraries; not even the C runtime library.
There are ways to circumvent these limitations. But they are complicated and make the virus more likely to fail.
Another natural limitation of viruses is rigid dependency on the file format of target executables. These formats differ a lot. Even on the same hardware architecture and under the same operating system. Furthermore executable are not designed with post link-time modifications in mind. It's rare for a virus to support more than one infection method. This document is about the format used on recent versions of Linux, FreeBSD and Solaris. [1]
This format is well documented. Some public resources:
Source code of Linux and FreeBSD. Admittedly not for the faint of heart. [2] |
/usr/include/elf.h [3] |
Portable Formats Specification, Version 1.1. [4] |
Linux Standard Base Specification [5] |
NetBSD ELF FAQ [6] |
Creating Really Teensy ELF Executables for Linux [7] |
A quote from the Portable Formats Specification:
The Executable and Linking Format was originally developed and published by UNIX System Laboratories (USL) as part of the Application Binary Interface (ABI). The Tool Interface Standards committee (TIS) has selected the evolving ELF standard as a portable object file format that works on 32-bit Intel Architecture environments for a variety of operating systems.
Actually ELF covers object files (.o), shared libraries (.so) and executable files. The Linux kernel [8] is also a valid ELF file.
GNU binutils provides two utilities to view ELF headers, objdump and readelf. [9] Functionality of both tools overlap, but I think the output of readelf is nicer. On Solaris the native tools for this purpose are called dump and avdp.
ELF is used for a variety of both 32 bit and 64 bit architectures. Obviously you need to handle assembly language for each platform. A good starting point is "Linux Assembly" [10] and "Assembly Language Related Web Sites". [11]
Introduction to Alpha [12] |
Alpha Assembly Language Guide [13] |
Assembly Language Programmer's Guide [14] |
Assembly-HOWTO. [15] Description of tools and sites for Linux. |
FAQ of comp.lang.asm.x86 [16] |
"Robin Miyagi's Linux Programming" [17] features a tutorial and interesting links. |
"Assembly resources" [18] covers advanced topics. |
IA-32 Intel Architecture Software Developer's Manual [19] |
"The Place on the Net to Learn Assembly Language Programming" [20] |
The Art of Assembly Language. 32-bit Linux Edition Featuring HLA. [21] |
X86 Architecture, low-level programming, freeware [22] |
Dr. Dobb's Microprocessor Resources [23] |
FreeBSD Assembly Language Tutorial [24] |
SPARC Standards Documents Depository [25] |
SPARC Assembly Language Reference Manual [26] |
A Laboratory Manual for the SPARC [27] |
SPARC technical links [28] |
By default all GNU disassembly tools adhere to the syntax of the GNU assembler. Veterans of i386 programming consider this style repulsive, however. gdb provides statement set disassembly-flavor intel to lower the contrast. And objdump has option -Mintel for similar effect. Still I prefer ndisasm [29] on i386 and will use it where possible. This tool has absolutely no understanding of ELF (or any other file format). But for the scope of this document this is a feature. The calculations necessary to get at the interesting bytes are interesting themselves.
In this document input for assemblers (including nasm) is stored in .S files. Traditional cc treat that as "assembler code which must be preprocessed by cpp". This is required on platform alpha where symbolic names for registers are not part of the assembly language. Output of disassemblers ends up as .asm.
The document itself is written in DocBook, [30] a XML document type definition. [31] Conversion to HTML is the last step of a Makefile that builds and runs all examples. However, this means that I can't provide one document comparing two platforms. Instead I set up everything for conditional compilation. I then build one consistent variation of the document on a single system.
You are now reading the platform independent part. The links below lead to actual examples, and the actual story of constantly improving technique. This part continues with general topics and larger chunks of source code. It is a bit like a huge appendix, since the platform parts frequently refer to chapters here.