How to avoid falling down the rabbit hole while analyzing malware

Original image made by Isabel Talsma

So far you got your first sample either during ongoing Incident Response or your are just studying (you can use this automated tool to collect in the wild samples). What are your next steps? Load sample to IDA and give it a try? Or maybe upload a sample to hybrid analysis to get some insights about its behavior?

In this article I would like to take a step back and get a bird’s eye view on the malware analysis process. In general when you dissecting sample you’re looking for answers like is this file benign or how it communicates with Command and Control server. If you don’t know what you are looking for you can easily fall down the rabbit hole and discover yourself in the next situation — it is 10 pm, all your teammates are already watching Netflix, but you are still single stepping with F7 in your favorite debugger and have no clue what is going on.

I was in this situation multiple times. And because I return from time to time to this issue I have decided to give you some thoughts on where to start your malware analysis. Also I thought it would be a great idea to ask people on Twitter what they are looking for during malware analysis or what information they are expecting to get from malware analysis report.

I will break malware analysis process into 3 stages:

  • Pre-Analysis or what you need to do before your hands go dirty
  • Analysis
  • Post-Analysis — all hardcore analytics is here

“If you know the enemy and know yourself, you need not fear the result of a hundred battles. ”― Sun Tzu, The Art of War

At this stage you should ask yourself more general questions, for example:

What is my goal? — this could be either extracting malware configuration or decryption of Command & Control communication. Great comment on this one from @LibraAnalysis

“First off, I’d suggest you ensure that the goal you have is clear. If you’re only checking if a sample is malicious or not, you can perform vastly different checks compared to “what is the domain name generation algorithm in this sample?”. When you have a clear goal, it becomes easier to work efficiently towards it. In the end, the analysis is always a race against the clock” — @LibraAnalysis

What am I dealing with? — there is a difference in how you approach ransomware and keylogger. @0verfl0w_ recommends to gather as much as possible information about malware family even before diving into strings.exe output.

“Do I know what malware family this is? Knowing what malware family I am looking at allows me to instantly gather information on it without starting static analysis yet, thanks to posts and write-ups on them that are already public. If I am unable to find any public reports, it’s either a high-risk malware being tracked (possible APT, high level ECrime) by a company that doesn’t want to publish any details as it’s part of an ongoing investigation, or its a new malware family that has emerged. Or maybe people weren’t interested in writing a blog post on it as it’s a basic keylogger for example.” — @0verfl0w_

What tools do I need to analyze this sample? — it is not the best idea to analyze .Net assembly with IDA. Trust me — I tried.

Quick tip from @JAMESWT_MHT — don’t be mislead by what sample claims to be, for example if it is singed Microsoft binary — you still may need to take a look. This is similar to “0” detection on VirusTotal.

“Our actions, not our words, define us”

Here I would recommend to start with running sample using sandbox — you can get low-hanging fruits like Command & Control IPs or what process injection technique is used by malware. There are free sandboxes like any.run or hybrid-analysis, or you can set up your own with cuckoo sandbox if you have reasons not to share sample. @hasherezade gives her recommendation on where to start malware analysis:

“First try to gather as much information as possible via behavioral analysis, and then look at the implementation of each observed technique (and those that for some reason were not detected during behavioral analysis). Are the used implementations typical or atypical for this category of the malware (this requires previous experience and comparative analysis).” — @hasherezade

Also great advice from @security_craig about how to speed up your analysis to skip known or boring part.

“For me trying to rule stuff out as known or boring is really about speed. So focus on the fast or automatic tools and how you can use those capabilities to find things that are new or heavily modified like strings, files touched, c&c from sandbox reports.” — @security_craig

It is very hard to overstated power of simple strings review which can uncover URLs, resources, embedded files, commands and @struppigel starts each analysis with hex editor.

“Always put the file in a hex editor and check it there → It seems so easy that lots of people underestimate how good it is to avoid rabbit holes and just don’t do it. I had students being stuck looking for a config with IDA although it was there in plain text in the file.” — @struppigel

From Incident Response prospective @ItsReallyNick recommends to collect metadata about the attacker environment like PDB paths or reliable timestamps that reveal malware developer info.

Another layer of defense commonly used by malicious actors is different obfuscation techniques like strings encryption and packers. That’s why @0verfl0w_ checks if binary is packed before starting Reverse Engineering

“Obviously before starting to RE, we should check if the sample is packed. If it is, we should unpack it, otherwise we can start analyzing it. Unless you want to reverse engineer the packer, this is an important step as it slows down analysis a lot if you try to manually reverse engineer a packer statically.” — @0verfl0w_

Quick tip from @ItsReallyNick is to document all de-obfuscation process so others can rip it apart.

Once you are done with all obfuscation techniques you are ready to hunt for some juicy staff, for example sample configuration like @0verfl0w_ usually do

“Does this sample have a configuration? How can I extract it both manually and automatically? This is another important factor of malware analysis, as being able to process several files at once and get automated indicators of compromise immediately is extremely helpful for threat intelligence, as well as for setting up a tracker for the malware family. Plus, knowing what the config looks like can give you an idea of what the malware is meant to do.” — @0verfl0w_

Or maybe you see some suspicious IP address which looks like Command & Control server. So here another great tips from @0verfl0w_

“Does this sample communicate out to a C2? If so, can I emulate the communications? Are the commands? This is the next step I follow when analyzing a malware family. Being able to interact with the C2 is also extremely important, as it allows you to grab additional modules, web injections, commands, and many more! Web injections give you an idea of where the threat actors are targeting, modules gives you extra information on the functionality of the malware, as well as if they are collaborating with other groups by dropping their malware as well.” — @0verfl0w_

Of course don’t forget to collect Indicator of Compromise and here are what @ItsReallyNick expects to get after successful malware analysis

“Host and network indicator extraction. Optimally, get me packets — process arguments launched and the method by which code execution is done.” — @ItsReallyNick

In general @mykill follows next steps during malware analysis with tools suggestion:

  • Strings
  • FakeNet-NG + ProcMon
  • Disassemble, e.g. in IDA
  • Debug, e.g. WinDbg or flare-qdb
  • Repeat the above with any unpacked files

To be honest, I can’t always force any particular order of discovery, so it’s more like a big fat “depends” — @mykill

My highlight of Analysis stage is from @LibraAnalysis, I would be so much thankful if someone said it to me few years ago.

“Another pitfall is to assume that one needs to understand everything. If you understand the gist of it, you can already continue in most cases. Of course there are parts that you need to understand completely, but if there are a few not so important parts that remain unknown or vague, that is fine, especially if you’re still beginning. You can always return to those parts later on, and quite often you can use the new insights of the later analysis to discover what the missing parts are about.” — @LibraAnalysis

“How do you know I’m mad?” said Alice.
“You must be,” said the Cat,
“Or you wouldn’t have come here.”

At this stage you should start analyzing power potential of malware. Here is some analytics from @0verfl0w_ regarding Command & Control communication:

“What functionality does this malware have? Can it download malware, upload files, etc. Although when analyzing the communications inside the malware I only typically scratch the surface getting a basic emulator up and running, before analyzing further looking for commands inside the binary, that help me understand the functionality. Upon finding a new command that is accepted by the bot, I can add this into the emulator I have developed, and immediately start handling that response if I ever receive it from the C2 server.” — @0verfl0w_

@IdoNaor1 recommends to define purpose of the malware as well as motivation of the malicious actor.

Both @IdoNaor1 and @hasherezade would try to draw some links between the currently analyzed malware and other known families. As example, @ItsReallyNick suggests to use code similarity comparison techniques to know what malware family is related to the sample.

At this point @JAMESWT_MHT will start working on YARA rule based on knowledge he got from Analysis stage. Also @JAMESWT_MHT recommends to start implementing preventing mechanisms to stay protected in the future — for example writing IPS(Intrusion Prevention ) rule to block malicious traffic.

Don’t forget to prepare report. It is as important as malware analysis process by itself. From my experience I recommend to include information for both executive and technical guys. Where for executive you focus mostly on possible impact, remediation steps etc. and for technical guys analysis details and triaging. Here is great comment on what information to include for triaging from @mykill :

  • IPs, domains, protocols
  • Persistence, files
  • Mutexes, processes

“This is all off the top of my head and omits things that I think might not be strictly necessary for triage for most analysts. If I were to sit at my desk and look at any of my malware reports, I bet I would regret sending this as-is. But indeed my time is limited, and I think this is more helpful than not replying.” — @mykill

Hope you enjoyed this reading! We talked about what you can look for during malware analysis as well as some tips and tricks to evade those tricky rabbit holes.

Huge shout-out to all wonderful people who were so kind to provide their thoughts used in this article so please go follow them on Twitter!

@mykill, @JAMESWT_MHT, @IdoNaor1, @0verfl0w_, @hasherezade, @struppigel, @ItsReallyNick, @LibraAnalysis, @security_craig

Threat hunting. Malware Analysis. Red teaming.