Malware Analysis and Reverse Engineering
Security is an ever growing concern for all areas of computer science -- from embedded systems and IoT over mobile devices, personal computers to servers and cloud infrastructures.
This course teaches the machine-level mechanisms of malware with a focus on x86-64 machines and Linux. We take a look at both sides -- ways for malware to attack and infiltrate a system as well as methods to detect malware and to deflect infections.
MARE comprises thirteen 90-minute lectures and nine practical exercises. The course is intended for sixth-semester bachelor CS students, though I also had a small number of electrical engineering students who attended the lecture last summer. Here you can find PDF versions of the lecture slides (in German):
-
This lecture motivates the topic by giving a short overview of the history of computer malware, citing some examples and the implications of malware proliferation on users and the economy. It shows a small example of virus-like code written in Python.
-
In this lecture, details of the compilation and linking process for C programs running on Unix are given. The ELF executable format is analysed in detail, followed by a description of linker functionality and symbol resolving. A short excourse on endianness is given, since this is rarely part of a standard CS curriculum.
-
The details of loading a program, shared libraries and dynamic linking as well as memory allocation are covered in this lecture. It starts with a quick summary of the x86 virtual memory subsystem, since some of the details of virtual memory tend to be long forgotten in semester 6 :-).
-
First, this lecture takes a look at the fork/exec mechanism of process creation in Unix. Following, the low-level program startup is described, starting from the execution of the dyld dynamic loader up to the invocation of the main function. The second half of this lecture describes the stack, function calls and stack frame structures.
-
This lecture starts by giving more details on stack frames, x64 registers and gives details on stack frames, local variable allocation, function nesting and base pointer handling. Then it discusses problematic C functions such as gets and strcpy and gives details how these functions can be abused to modify stack contents. Examples of the use of buffer overflows to directly deposit shellcode on the stack (for executable stacks) and performing return to libc attacks are given.
The lecture includes material by Bart Coppens (https://www.bartcoppens.be/).
-
The ABI for x86-64 ELF programs requires that function parameters are passed in registers, compared to stack-based parameter passing on 32 bit x86 machines. Thus, calling functions by directly pushing parameters and return addresses on the stack is no longer possible. This lecture details return-oriented programming, a method to assemble small pieces of existing object code ending in a RET machine instruction to create code segments which can be jumped to from the stack, load registers with required values, and return to the manipulated stack context.
The lecture includes material by Bart Coppens (https://www.bartcoppens.be/).
-
This lecture presents mechanisms such as stack canaries and shadow stacks that intend to protect from buffer overflows and so exacerbate the effort an attacker has to undergo in order to execute malicious code on a system. It also describes hardware methods provided to make attacks more difficult, specifically ASLR and W^X/Data Execution Protection.
-
Static and dynamic methods to detect viruses are discussed in this lecture. It discusses the problems of signature analyses, morphing viruses and their obfuscation techniques and more sophisticated static analysis methods such as abstract interpretation. Following, dynamic semantic analysis, specifically control-flow analyses, are detailed.
Polymorphic viruses and self-modifying code
In this lecture, advanced methods of virus writers to disguise virus code and obfuscate malware operation are discussed, including confusing disassemblers, self-decrypting and self-modifying code.
-
So far, this course has not discussed kernel-mode operation and interaction. However, some advanced dynamic virus behaviour analyses use system call traces in order to detect unusual and suspicious behaviour. Thus, this lecture discusses system call mechanisms, parameter passing and syscall tracing using strace and ptrace.
-
Following up on the topic of lecture 10, this lecture discusses different approaches using system call tracing in order to detect malicious behaviour.
-
This lecture continues with the discussion of kernel-mode malware, especially rootkits. It discusses x86 protection rings and their use in OS- and VM-based environments and some details on code checksumming. As an example, details of the Blue Pill rootkit are given.
-
In this lecture, the most important topics of the course are rehearsed and the relationship between all the topics are discussed. This also serves as a preparation session for the written exam.
Tags: malware, reverse_engineering, MARE, course