Malware Analysis Series — Part 2: Behavioral Analysis

You could consider signature-based analysis to be like a policeman running the plates of every car in a parking lot against the police department’s database of stolen vehicles. While this may be an effective method for finding stolen vehicles, if the license plate on the car has been changed or obscured, the car will most likely be overlooked. Keeping with this analogy, behavioral analysis would be the detective.

Posted in Ask a Security Pro

July 13, 20163 min read

The detective pays no attention to the license plate, instead looking for clues to a crime such as signs of forced entry, being parked in a suspicious location or having an obfuscated vehicle identification number (VIN). Behavioral analysis pays less attention to what a file appears to be, instead looking at how it behaves, looking for suspicious actions and attributing this data toward a profile of the code in question. Ultimately, behavioral analysis will determine whether code is malicious using a score-based model. Code that falls into more of a gray area, i.e. a medium to high threat score, may be passed on to a human researcher for review, whereas code with a very high threat profile may be automatically classified into a signature for future use in signature-based analysis depending on the level of trust held in the behavioral analysis mechanisms.

Obfuscated Code

With behavioral analysis, a scanner might look for things like a script opening outbound connections to an untrusted remote machine, or from a location inside WordPress where you wouldn’t normally expect outbound connections to originate. Behavioral analysis is exceptionally useful in a modular web application like WordPress because scripts in certain areas can typically be expected to behave in certain ways, like a CAPTCHA plugin in /wp-content/plugins/* could be expected to grab remote image content at regular intervals coinciding with page requests, while scripts in other directories like /wp-includes/ may not necessarily be expected to do so. In most cases, it would also be safe to assume, for example, that WordPress core files shouldn’t contain obfuscated code. By taking into account the expected behavior of a typical WordPress website, which sports a somewhat uniform installation, you can establish a relatively firm baseline as to what type of behavior may not belong, and likely poses a greater threat.

In the case of SiteLock® SMART™, we’ve integrated both signature-based analysis, as well as behavior analysis through machine learning. This means that virtually every scan will employ a greater level of data and understanding than the last, having logged behavioral data from every file scanned across the millions of websites that SiteLock® protects. Machine learning means that the mechanism is always learning new patterns and behavior, greatly increasing its capability to discover new and exotic malware in the wild.

Have a question for our security professionals or a topic that you would like us to write about? Message @SiteLock and use the #AskSecPro tag!