Core Concepts and Techniques in Reverse Engineering

In this article we will delve into the topic of reverse engineering, which constitutes a central skill for both software and cybersecurity professionals and thus deserves further investigation.

Reverse engineering skills can be leveraged in a great variety of ways, such as nuanced and in-depth understanding of poorly documented systems, vulnerability discovery, security auditing and systems exploitation.

We will elaborate over the concepts, tasks and activities involved in this seminar as well as providing an overview on the available tools to achieve them.

Concepts In Reverse Engineering

Reverse engineering is defined as the act of dismantling an object to see how it works, mainly done to analyze and gain knowledge about the way something works, often used to duplicate or enhance the object.

While normally this is consiedered in the area reversing and exploitation of binary elf files compiled for the x86-64 and arm64 instruction set, the concepts involved are applicable for a great variety of technological domains ranging from machines, electronic devices to software and even biological systems found in nature such as viruses.

It might also be useful in the refactoring or maintenance of legacy applications, once its architects are long gone.

For any kind of reverse engineering procedure, we will identify three steps involving information extraction, the development of a conceptual model and its review.

Information extraction involves the analysis of the object being revese-engineered, information about its design is extracted and that information is examined to determine how the pieces fit together.

In software reverse engineering this might require gathering source code and related design documentation for study and may also involve the use of tools, such as a disassembler to break apart the program into its constituent parts.

Modeling involves the abstraction of the collected information into a conceptual model, with each piece of the model explaining its function in the overall structure.

The purpose of this step is to take information specific to the original and abstract it into a general model that can be used to guide the design of new objects or systems.

In software reverse engineering this might take the form of a data flow diagram, a structure chart, or any kind of descriptive tool.

Review involves proving that the derived conceptual model is the right one and testing it in various scenarios to ensure it is a realistic abstraction of the original object or system.

In software engineering, this might take the form of software testing from a set of derived technical requirements.

Once it has successfully passed tests, the conceptual model can be implemented to re-engineer the original object.

Reversing For Vulnerability Discovery

A reverse engineer of a software artifact may have multiple motivations, such as integrating with a propietary piece of technology or developing countermeasures against a specific class of malware.

We will focus on the objective of exploitation analysis, which refers to the process of examining software or systems with the intent of identifying vulnerabilities and understanding how these vulnerabilities can be exploited by malicious actors.

Information Extraction

An actor doing reverse engineering uppon a software system should first focus on the information extraction phase of the process.

This can be achieved in a variety of ways depending on the kind of artifact being reverse engineer. For example, a JavaScript file can be analyzed with a text editor and a web browser without many complications, while compiled code will require additional tools and processes.

There are two general approaches towards the process of reverse engineering code with the purpose of discovering vulnerabilities, static code analysis and dynamic code analysis.

Static code analysis generally involves two main techniques, pattern-based analysis and flow-based analysis.

Pattern-based static code analysis is used to identify code patterns that contravene established coding rules.

For example, while analyzing ELF binaries, there are a set of C standard library functions and operating system calls that are known to be insecure, such as with the case of unbounded memory writting and the lack of bound checks of the gets() function in the standard C library.

Tools useful for this task are the GNU Binutils, whcih are a basic set of tools for binary management tasks and GHidra, which provides strong static code analysis capabilities.

Other tools that can be used for the same purposes are radare2, IDA Pro and binary ninja.

Flow-based static code analysis involves identifying and examining the different routes that can be taken through the code by observing the lines of code that are being executed or the order in which variables are being created, modified, used and destroyed through their lifecycle.

This might be useful for discovering bugs such memory access violations, null pointer dereferences, race conditions or deadlocks.

Dynamic code analysis or instrumentation involves the examination of the internal workings and structure of an application during runtime, as well as detecting and reporting internal failures as they occur.

For dynamic analysis, a debugger such as GDB is used to precisely examine the execution flow of an application as it occurs, as well as tools such as ptrace.

Fuzzers such as AFL++ can also be used to explore if any of the flows of the application results in an indeterminate program state reachable via a concrete set of inputs.

For dynamic instrumentation, it is possible to inject behavior using a tool like Frida.

The combination of all these tools compose a binary reverse engineering stack that can be used to extract information about the inner workings of the application, which can be used to generate software documentation or knowledge to build a model.

Modeling

The software engineering field has developed over decades multiple techniques for modeling a specification, architecture or design of applications.

On (this blog)[https://www.synkops.dev/posts/how-to-write-technical-docs/] we have discussed how to write technical proposals with the C4 Model using mermeaid.js, which can be an excelent framework to document the behavior observed during information gathering.

Another tool that can be used for this task is structurizr.

Review

Modeling is necesarily bounded to testing because any scientist knows that any model that is not proved by evidence is just a hypothesis.

Models become real when they match the expectations enforced by experimental tests or formal proofs.

Usually a reverse engineer will prove the model by directly watching what the program does and documenting its activities via dynamic analysis or with a sandbox such as CAPE or Noriben.

However, the observability of software and well as its monitoring is still a hard problem in the area.

Countermeasures Against Reversing

Code Obfuscation

The main countermeasure against unauthorized code analysis of distributed code artifacts is code obfuscation, which involves transforming the code or binaries in a way that makes it more difficult to understand by renaming variables and functions, adding dummy code and reestructuring the code.

Tamper Detection

This is used to protect against the distribution of tampered software artifacts which consists on the verification of the digital signature of the application against the public key of the developer.

This is commonly used in application stores or software repositories to protect against tampered binaries.

Hardware Security

This is particularly relevant for the protection of embedded systems, integrated circuits, mobile devices and cryptographic modules against unauthorized access and tampering.

The main logical countermeasure at this level is to implement secure boot mechanisms, so that only digitally signed and authorized firmware is run on the device.

Regarding physical security, additional countermeasures against tampering and reverse engineering of hardware involves defending against physical access to the device via the use of secure enclousures, tamper-evident seals and physical locks.

Conclusion

This post was the first step in a very long road to mastering the reverse engineering as a process for cybersecurity.

On this document we reviewed the general steps to designing a reverse engineering project, the activities involved and proposed a set of tools to be used to achieve the individual tasks for its successful execution, as well as the main countermeasures that a defender can use against it.

There is still much to talk about on this topic as reverse engineering can be applied to a variety of technologies and undocumented systems. We will hopefully explore and document some of these topics in future articles.

whoami

Jaime Romero is a software engineer and application security expert operating in Western Europe.