Thursday, February 26, 2015

Extracting RAW pictures from memory dumps

Introduction

Earlier today, while reading my Twitter timeline, I saw some Infosec folks discussing about scripts/tools to identify RAW pictures in memory dumps. I decided, then, to write this blog post and share a small hack that I use to visualize data (including memory dumps).




A few months ago, I wrote a post detailing how to Scan the Internet & Screenshot All the Things, now it's time to Dump the Memory & Screenshot All the Things.




Memory Dumps

The first thing you will want to do is to narrow the analysis to the process containing interesting images/pictures. I'm going to use three different memory dumps here:

Remote Desktop Client - Windows 7 x64 (mstsc.exe)

Let's use the Windows built-in RDP client to connect to an external server and dump the process
memory using procdump:


procdump.exe -ma mstsc.exe mstsc.dmp



Microsoft Paint - Windows 7 x64 (mspaint.exe)

Let's load/save a simple image file on Paint and run procdump again:



procdump.exe -ma mspaint.exe mspaint.dmp



9447 2014 CTF Challenge: coor coor - Windows XP (VirtualBox.exe)
There's an awesome write-up for this CTF challenge here, go read it now if you haven't yet. We are going to use volatility to isolate the VirtualBox memory dump:

python vol.py -f challenge.vmem pslist


python vol.py -f challenge.vmem memdump -p 1568 --dump-dir=dump/



RAW Image Data

Rename the file extensions from *.dmp to *.data, download/install GIMP and open them as "RAW Image Data":


That's it, now you can use GIMP to navigate within the memory dump and analyse the rendered pixels/bitmaps on their corresponding offsets. It's worth mentioning that different images will be rendered using different Image types and variable widths: you may need to adjust these values accordingly.

So what can we spot here?

  • On the RDP memory dump, we can retrieve the tiles and Windows displayed during the connection, including IP's, usernames and commands:
Windows commands
Remote Desktop Client Window
RDP session
  • The Microsoft Paint picture can be easily spotted: they're upside down because that's the way BMP's are stored:
We need upside down backdoors "this big"

  • The most interesting artifacts were collected from the Coor Coor dump. The user was running a TrueCrypt container inside VirtualBox and after some offset adjustment we can see the Pidgin Window, the user account (testicool69@yodawg.9447.plumbing) and a few OTR settings:
While True: width ++ || width--

Notice that the Windows are not perfectly aligned here, but we can see the data by zooming in:

Enhance pls

Looks like our killer is screwed. YEEAAAH.

We can also spot the Window taskbar, just like the volatility screenshot plugin showed us on the previous write-up:


python vol.py -f challenge.vmem screenshot -D screenshot/

It's also possible to spot icons from the running programs, like this one from Virtualbox:

VirtualBox icon


Conclusion

This technique is very common among ROM hackers as they try to find image patterns inside raw game dumps. Check my write-up from Hack.lu 2014 CTF to find more about it. By the way, you can also use Tile Molester instead of GIMP to browse the RAW data.

You may be asking - why not carve the dumps using binwalk and foremost or extract them using the dumpfiles volatility module? If you try it yourself you will notice that they won't find the magic bytes for all those images.

As far as I know, there's no off-the-shelf tool to automagically extract them, but it should't be that hard to write a binwalk/volatility plugin for this based on some heuristics. Binwalk, for example, can find raw deflate/lzma streams by building headers on top of the raw compressed data and writing it back do disk.

I'm no Computer Visualization expert, but here's a few suggestions:

  • Set the image width to common display resolutions. The taskbar from the coor coor memory dump could be displayed by setting the width to 1440 points (1440x900 is a common screen resolution).
  • Use common window background/patterns as a template to find interesting sections.
  • Create a multi-view/side-by-side RAW image browser based on GIMP source code (multiple image types, multiple widths etc).
  • Use Google's artificial brain to find cat videos.
  • Get a bigger monitor (yeah, it helps).

I hope you all use these skills wisely, avoiding any kind of superfishal investigation like our Lenovo friends.




Monday, February 9, 2015

Firmware Forensics: Diffs, Timelines, ELFs and Backdoors

This post covers some common techniques that I use to analyze and reverse firmware images. These techniques are particularly useful to dissect malicious firmwares, spot backdoors and detect unwanted modifications.

Backdooring and re-flashing firmware images is becoming mainstream: malicious guys are infecting embedded devices and inserting trojans in order to achieve persistence. Recent articles covered the increasing number of trojanized android firmwares and routers that are being permanently modified.

Attackers with a privileged network position may MITM your requests and forge fake updates containing malicious firmwares. Writing Evilgrade modules for this is really simple, as most vendors keep failing to deliver updates securely, right ASUS?

All your HTTP packets are belong to us...
Older versions of ASUS firmwares were vulnerable to MITM attacks (CVE-2014-2718) because it transmitted updates over HTTP and there were no security/signature checks. ASUS silently patched the issue on 3.0.0.4.376+ and they're now verifying RSA signatures via /sbin/rsasign_check.:

Valid signature -> nvram_set("rsasign_check", "1")

NoConName 2014 CTF Finals: Vodka

I'll keep my tradition of writing posts based on CTF challenges because everybody upvotes CTF posts on reddit it's cool.

The challenge "Vodka", from NoConName 2014 CTF Finals was created by @MarioVilas, who kindly provided the files here (thanks dude!).

I did not participate on the CTF finals, but I found the challenge really interesting because there were many different ways to solve it, summarizing the actions needed to audit a compromised firmware. In my opinion, the best CTF challenges are the ones that require us to develop/use new techniques and improve existing tools.

NoConName 2014 Finals: Vodka
Challenge Category: Forensics
Description: No hints :( just get the flag.

This challenge description is not very intriguing, so I hired a couple of marketing specialists to design a new logo add some Infosec drama and reformulate it:

A mysterious bug affected one of the core routers at a major Internet service provider in Syria. The failure of this router caused the whole country to suddenly lose all connection to the Internet. The Syrian government recorded a traffic capture right before the crash and hired you to perform a forensic analysis.

Download provided: https://github.com/MarioVilas/write-ups/blob/master/ncn-ctf-2014/Vodka/vodka


Network Forensics

The download provided is a packet capture using the PCAP-NG format. Wireshark is too mainstream, so let's convert the PCAP-NG to PCAP and open it using Network Miner:



Network Miner makes it very easy for us to understand what's going on: there's some sort of file transfer via TFTP and the filename seems to be related to an OpenWRT firmware image.


Firmware structure

We always binwalk all the things but very few people stop to analyze and understand the firmware structure properly. We know that the firmware image was downloaded using TFTP, a common way used by many routers to transfer config files/updates and it is probably based on the OpenWRT project.

So what does binwalk tell us?



The Commom Firmware Environment (CFE) is a firmware interface/bootloader present on Broadcom SOCs. It is analogous to the BIOS on PC platforms and it is responsible for CPU initialization and bootstrap code on embedded processors. The CFE is also referred as PMON and it is generally mapped to mtd0.

The JFFS2/NVRAM filesystem is the non-volatile partition. They store all the configuration parameters, including router settings, passwords and logs.

Bear in mind firmware updates generally do not include the CFE/NVRAM partition. You can access the CFE console using serial and you can also dump them on a live system using DD or via SPI. Let's focus on the firmware sections included on the provided image (openwrt-wrtsl54gs-squasfs.bin):

TRX (Offset 0x20)

The TRX header is just an encapsulation, describing a series of information from the firmware, including the image size, CRC, flags, version information and partition offsets. Binwalk wasn't recognizing the header and the relative offsets properly so I submitted these two pull requests. Creating custom signatures for binwalk is pretty straightforward.


Some firmwares (like the newer ones from ASUS and Netgear) use this TRX structure but don't include a loader: the Linux Kernel and the RootFS may be shifted on this occasion.


If the firmware includes any extra header before the TRX, you have to sum their size with the displayed partition offsets in order to find the real values. Some firmwares for SOHO modems out there won't include it, so these values should be right on most cases. The downloaded OpenWRT image had the following offsets:

  • Loader: 0x20 + 0x1C = 0x3C
  • Kernel:  0x20 + 0x8D8 = 0x8F8
  • RootFS: 0x20 + 0x7E400 = 0x7E420

In this specific case, we have a BinHeader right before the TRX, indicating the board ID, the FW Date and the Hardware Date. The struct is described on cyutils.h:

This extra header appears on a few routers like the WRT54G series: the Web GUI checks for this pattern before actually writing the firmware.



We are particularly interested on the fwdate field (Firmware Date), composed by the hex values 07 02 03. According to addpattern.c, the first byte defines the year, the second one is the month and the third byte refers to the day the firmware was created. The fwdate seems to be 03-February-2007, save that for later, we will need that =)

GZ'd LZMA Loader (Offset 0x3C)

According to OpenWRT Wiki, the boot loader has no concept of filesystems: it assumes that the start of the TRX data section is executable code.

The boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it. It turns out the boot loader does know gzip compression, so we have a gzip-compressed LZMA decompression program at 0x3C.

You can find the source code for this lzma-loader here and here. Note the TEXT_START offset at 0x80001000: we may need to adjust the Loading Address on our Disassembler in order to reverse the compiled loader. Don't forget to decompress it (gunzip) before reversing the file.


Most embedded toolchains would strip the binaries in order to reduce the firmware size. If you want to reverse a friendlier version of the loader, grab the latest OpenWRT ImageBuilder and search for loader.elf:


Woohoo, blue code =)

Note that if we modify the loader to include a backdoor, we would have our very own Router Bootkit, cool isn't it?

LZMA'd Kernel (Offset 0x8F8)

Instead of just putting a kernel directly onto flash, most embedded devices compress the kernel using LZMA. The boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it.

Binwalk has a signature to find Kernel strings in raw Linux Kernels. The identified string lists the toolchain used to compile the Kernel, as well as the compiled date and version information:


And why did binwalk manage to find all these information from the Kernel? The answer can be found on the toolchain's Makefile:

If we follow the steps from my previous post we can build a customized Kernel for OpenWRT. The generated vmlinux is generally an ELF file, but in our case, the object was stripped using objcopy:

Did you notice the compile date was 03-February-2007? Let's save that for later as well.

SquashFS (Offset 0x72420)

The last part is the actual filesystem. Most embedded Linux devices use SquashFS and many vendors hack it in order to get better compression and faster performance. Hopefully we don't have to worry about that as Sasquatch handles different SquashFS header/compression formats.

The filesystem has the standard OpenWRT directories and files, including a banner from the 0.9 build (White Russian).


Both binwalk and sasquatch display the SquashFS superblock information, including the creation/last append time:


Did you spot the date 29-October-2014? There's definitely something going on here =)


Directory Tree Diff & Fuzzy Hashing

Now that we have unpacked & unsquashed the firmware, let's use binwally to compare the directory tree and find the needle in the haystack.

After googling the filename (openwrt-wrtsl54gs-squashfs.bin), we get three possible candidates:

https://downloads.openwrt.org/whiterussian/0.9/default/openwrt-wrtsl54gs-squashfs.bin
https://downloads.openwrt.org/whiterussian/0.9/micro/openwrt-wrtsl54gs-squashfs.bin
https://downloads.openwrt.org/whiterussian/0.9/pptp/openwrt-wrtsl54gs-squashfs.bin

OpenWRT offers different builds for the same device because of constraints like limited flash size. Let's download these three candidates, unpack and compare them:

binwally.py ctf/_openwrt-wrtsl54gs-squashfs.bin.extracted/ micro/_openwrt-wrtsl54gs-squashfs.bin.extracted/


The "micro" build has the highest overall match score (99%), let's spot the differences:

binwally.py ctf/_openwrt-wrtsl54gs-squashfs.bin.extracted/ micro/_openwrt-wrtsl54gs-squashfs.bin.extracted/ | grep -E -v "ignored|matches"


After carefully reviewing these files, we notice that the "/etc/profile" was modified to include a call to the nc backdoor.


The LZMA'd Kernel (offset 0x8F8) is the same on both images, even though binwally reports a difference. This happens because binwalk extraction doesn't know when to stop and both files also contain additional data like the SquashFS partition.

The backdoor located at "/bin/nc" is a simple bash script that checks the MD5 from "/etc/profile" and draws a Nyan Cat along with the challenge key. In order to get the proper key, we simply modify the file location to the relative path "./etc/banner", to avoid overlapping with the file from the original system.


After running the file, we get the key NCNdeadb6adec4c77a40c23e04770924d3c5b18face.


This was just too easy right? But what if we didn't have a known template for comparison?


Timeline Analysis

My tool of choice to perform timeline analysis is Plaso, created by @el_killerdwarf. The tool is python-based, modular and very fast. What I like most about it is the ease to output results to ELK. If you don't know about Plaso and the ELK stack, read this quick tutorial and set up your environment.

Let's use log2timeline to create a dump file, pointing to the extracted SquashFS path:

log2timeline.py output.dump squashfs-root/


Let's fire up psort and include data in the timeline:

psort.py -o elastic output.dump


That's all, Plaso uses the filestat parser to extract metadata from the files, outputting results to Elasticsearch.

We already identified the following dates from the firmware:

  • 03 February 2007 (??:??:??): BinHeader firmware creation date
  • 03 February 2007 (13:16:08): Linux Kernel compile date
  • 29 October  2014 (16:53:25): SquashFS creation or last append time

First let's filter the filesystem attributes: we just want to display the mtime (modified) timestamp, so we are going to perform a micro analysis to include the value. The filter should be something like this: field must | field timestamp_desc | query: "mtime".


The histogram view is very helpful to get a big picture of what's going on:


We can clearly see that the files included/modified on 2014-10-29 had a malicious nature. The state sponsored attacker did not modify other files from the OpenWRT base image.


At this point it is pretty clear that the firmware was modified using the OpenWRT Image Builder, which is a pre-compiled OpenWrt build environment. The BinHeader and the Kernel timestamps were left untouched and the only partition modified was the SquashFS one.

Of course these timestamps, like any kind of metadata, could be tampered by the malicious hacker. However, they are very helpful during the initial phases, speeding up investigations and narrowing the analysis to a smaller set of data.


ELF Structural Information

I always get impressed when AV vendors manage to profile APT and State-sponsored attackers based on PE timestamps. Techniques like the imphash are generally used exclusively on Windows.

PE Imports are the functions that a piece of software calls from other files (typically DLLs). To track these imports, a hash is created based on library/API names and their specific order within the executable. Because of the way a PE’s import table is generated, we can use the imphash value to identify related malware samples, for example.

Everybody does that for Windows binaries but what about Linux? Virustotal recently included detailed ELF information on their engine. We can also use these sections to identify useful information from the binaries, including the toolchain used to compile them.

We generally don't have any timestamp information on the ELF section, but there are many other interesting fields. This quick guide on using strip summarizes some topics:

When an executable is produced from source code, there are two stages - compilation and linking. Compiling takes a source file and produces an object file. Linking concatenates these object files into a single executable. The concatenation occurs by section. For example, the .comment section for the final executable will contain the contents of the .comment section of each object file that was linked into the executable.
If we examine the contents of the .comment section we can see the compiler used, plus the version of the compiler
It's pretty simple to read and parse the .comment sections from ELF files. GNU readelf (part of binutils) and pyelftools include all the necessary functions parse them.

I always try to display information from object files using different toolchains in order to find out which one understands the file structure properly. On this specific case, I'm going to use mipsel-linux-gnu-readelf (part of Emdebian toolchain), but the regular readelf also does the job.

for i in $(find .) ; do echo $i ; mipsel-linux-gnu-readelf -p .comment $i ; done > comment-section.txt
./lib/modules/2.4.30/diag.o

String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/switch-adm.o

String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/switch-robo.o

String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/switch-core.o

String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/wlcompat.o

String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)

Just a few ELF files included the comment section, others got stripped during the compilation/linking phase. If we download OpenWRT 0.9 sources we can see that GCC 3.4.4 was indeed used:


TheMoon Worm exploited a command injection to infect Linksys wireless routers with a self-replicating malware. If we analyze its .comment section, we can see that it was probably compiled and linked using GCC 4.2.4 and 3.3.2. If we search for a .comment section on the router E4200, targeted by the worm, we can't find any reference because the toolchain stripped all of them. Having a file compiled with a different toolchain or containing extra ELF sections (that others files don't) is something highly suspicious.


The .comment section for the final executable includes the contents of the .comment section of each object file that was linked into the executable. If we compare the comment section on ASUS RT-AC87U Firmwares v3.0.0.4.378.3885 and v3.0.0.4.376.2769, we can spot an extra line on the newer version from tfat.ko:


If you want to dump all sections from the ELF file you may use this command line (kind of hacky, but works):

for i in $(find .) ; do echo "$i" ; for j in $(readelf -S "$i" | grep \\[ | cut -d"]" -f2 | cut -d " " -f2 | grep -v "Name") ; do mipsel-linux-gnu-readelf -p "$j" "$i" ; done ; done > list.txt

The output will be a bit too verbose, you may want to narrow the analysis to the following sections:

  • .comment - contains version control information
  • .modinfo - displays information from a kernel module
  • .notes - comments put there by the compiler/linker toolchain
  • .debug - contains information for symbol debugging
  • .interp - contains the name of the dynamic loader

For more information regarding the ELF file structure, check the ELF man and the Chapter 5 from Malware Forensics Field Guide for Linux Systems.


Conclusion

Without further clues or context these information may not be relevant, but in conjunction with other data they're helpful to get a big picture of what's going on:

  • Diffing the content from previous firmwares may be useful to find out when backdoors were first installed, modified and/or removed.

  • Artifact timeline creation and analysis also helps to speed up investigations by correlating the vast amount of information found on system.

  • The contents from the ELF section will likely reveal the toolchain and the compiler version used to compile a suspect executable. Clues such as this are attribution identifiers, contributing towards identifying the platform used by the attacker to craft his code.

We can use the timestamps from the kernel partition to correlate different firmwares from the same family, for example. We can also compare the timestamps from each partition to find deviations: a firmware header created on 2007, with a Kernel timestamp from 2007 and a SquashFS partition dated to 2014 is highly suspicious.

The Firmware.RE project is performing a large scale analysis, providing a better understanding of the security issues related to firmwares. A broader view on firmwares is not only beneficial, but necessary to discover new vulnerabilities and backdoors, correlating different device families and showing how vulnerabilities reappear across different products. This is a really cool project to track how firmwares are evolving and getting security fixes.