Obfuscation is the process by which information is hidden or made difficult to identify. In application security, this is done by introducing additional complexity to the flow of execution, adding operations to otherwise simple computations, or even inserting totally useless code. You can use these techniques to hide sensitive application data, critical code, or user information within a particular piece of software.
Using obfuscation to protect data in apps
Obfuscation distracts and confuses attackers, making it much more difficult for them to discover data stored in your applications or find vulnerabilities to exploit. Veracode’s State of Software Security Report found that 76% of software applications have at least one security flaw, with 66% containing an OWASP Top 10 vulnerability. Our own recent research uncovered serious security vulnerabilities in 77% of mobile finance apps.
There are multiple techniques to obfuscate code and the most effective protection strategies layer several together. Generally, the value of the application code and data it contains will dictate the level of sophistication and extent of the code obfuscation you should apply.
Control flow obfuscation
Control flow obfuscation is primarily focused on adding complexity to the flow of execution. This slows down both manual and automated reverse engineering. This is done by creating new execution paths and splitting existing paths into several different pieces. For example, you can inline functions (replace method calls with the actual method body) or replace calls to subroutines with computed jumps.
One of the most effective forms of control flow obfuscation is control flow flattening. In this obfuscation technique, an ordinary function or collection of functions in an application is converted into a state machine. First dividing the function into many smaller parts and then creating a dispatcher, which manages the execution of each piece. This complex form can make the function behavior look unusual or even nonsensical.
By obscuring the critical code paths that operate on sensitive data, bad actors will have a harder time identifying where data is processed or managed.
Junk code adds extra instructions to the program which will never be executed. The addition of this unused code adds to the number of potential targets attackers have to analyze, thereby slowing down their attack progress. Everything from whole functions to single instructions can be leveraged for this type of defense.
Junk code, because it never runs, can take any form you like. You can even make modified copies of legitimate code to further complicate the analysis. In more advanced cases, garbage instructions can be generated in such a way that traditional reverse engineering tools will have trouble reading them. Interwoven with legitimate code, automatic analysis is massively hindered.
Data security can be greatly improved by this type of protection. Instructions that are guaranteed to never execute can appear to make any number of transformations of legitimate data. Automated tools will be misled and think that data is being referenced, used, and modified in various ways when, in reality, none of this is true. During manual analysis an attacker might be convinced there is some algorithm or decoding applied to data that you don’t really care about.
Similar to junk code, decoy code is additional code inserted into an application that is executed but does nothing. The primary goal of decoy code is to hide the intended purpose or result of legitimate code. For example, computing a value that is never used or executing an entire block of code only to discard the result afterwards.
Zero sum operations like those mentioned are an effective method to obfuscate the meaning or value of legitimate code. For example, an elegant and simple proprietary data processing algorithm can be hidden amongst a sea of useless computations.
String literal obfuscation
Strings in an application are of high value to an attacker because they are easily found and typically provide great insight into the code they want to target. They often contain messages and information that users see when using the application, making it trivial for malicious actors to map features to code.
String literal obfuscation provides protection for strings by transforming them to be unrecognizable. Either through the use of encoding or encryption, the goal is to hide the data from automated scans and manual inspection at all times. This means both on disk and while the application is running.
There are a couple of common ways to achieve this. First is fully encrypting every string in the application and simply decrypting once the app starts up. Alternatively, strings can be left in obfuscated form and decoded or decrypted on the fly as they are used. Because the data does eventually need to be used, you need to minimize the amount of time this data spends in the clear.
Applying advanced code obfuscation
Numerous free obfuscators are available for you to apply simple obfuscation to your code. These provide very limited protection, however. Developers of mission critical applications that handle sensitive personal information, financial information, or patient data should use more advanced code obfuscation methods. whiteCryption Code Protection uses the above described techniques and more to provide strong, layered obfuscation that prevents reverse engineering and helps protect application data. Code obfuscation is just one of the in-app defense tools Code Protection deploys to help businesses protect their investments in their software and users.
About Jake VanAdrighem
Jake VanAdrighem is Technical Product Manager at Intertrust Technologies, responsible for product vision of Intertrust's whiteCryption Code Protection application security solution and white-box cryptography library Secure Key Box. Jake has a user focused background in systems and compiler engineering.