Tuesday, March 1, 2016

Analysis of a Dridex maldoc pre-Locky

The latest trends on the security threat landscape have mainly been Ransomware distributed via infected websites, and Banking Trojans distributed via malicious documents attached to phishing emails. In particular, Dridex banking trojan has been one of the most active threats. Last week the two threats merged and Dridex began distributing Locky ransomware as well.
In this post we will go through the analysis of a malicious office document delivering Dridex banking Trojan, spreading just the day before it switched to Locky and we will the similarities between the two actors that make us believe the same is behind the two.


On Friday February 12th, we observed a big wave of phishing attempts, over 700, which looked like the following:

Sender: fpo.cc.XX@vosa.gsi.gov.uk
Attachment: Fixed Penalty Receipt.docm
MD5 Checksum: 50e1c94e43f05f593babddb488f1a2f9

Where XX are two random digits. Few days later, on Monday February 15th, we observed a second bigger wave, this time counting over 1700 phishing attempts and all from a specific sender. The email looked like the following:

Sender: kpegg@responserecruitment.co.uk
Subject: Invoice (w/e 070216)
Attachment: SKM_C3350160212101601.docm
MD5 Checksum: d93f33e2d5a4b3232f824dbd1d897df4 


The .docm is a MS Word file format similar to .docx, it's basically a zip archive containing the xml files. Therefore it is possible to perform some static analysis using oledump.py, which helps to immediately identify the macro streams within the document:

Going through the different vba streams, we can find the AutoOpen() subroutine, which is the starting point executed as soon as the user opens the .docm attachment.

And by opening the stream A7, the biggest one containing all the code, we find the AddSensors() subroutine called above. Following the code, we reach the point where there seems to be the embedded domain address. Here the obfuscated url and the decryption routine:

The array on top is the url, while the ErrHandler function represent the de-obfuscation routine, confirmed as follows:

Performing a dynamic analysis, we get the confirmation about the previous finding. In particular, when the Word file is executed, it performs a GET request to hxxp://www.profildigital[.]de/09u8h76f/65fg67n (82.197.153[.]120), from which it downloads the executable Ladybi.exe.

This specific sample would then starts communicating with the C&C server at the address 5.45.180[.]46. Just one day later, on the 16th of February, Locky ransomware started to spread. Evidences from the analysis of Locky showed that the same actors behind this Dridex analyzed here could be the same behind Locky. Evidences are:
  • Similar IOC
  • Similar modus operandi (TTP) and spreading mechanisms
  • Similar C2 infrastructure
You can find more references and indicators about Locky analysis and the correlation with Dridex at the following references [1][2][3][4].

Happy Hunting



Wednesday, January 27, 2016

Windows ReVaulting

Windows Vaults and Credentials allow the user to store sensitive information such as user names and passwords , that can be later used to log on web site, services and computers. In this post it will be shown how such data is protected and how you can decrypt it offline.

This post is a very late debriefing of the talk I had at SANS DFIR Summit Prague 2015 and it's the first of two posts. You can download the slides from SANS Summit Archives or from SlideShare.


I've never used Vault/Credential facility on purpose, even if the system used it without my knowledge : it's worthwhile to know that Windows autonomously uses it almost every day. In any case, we can find sensitive information there, and this is the reason I started this research, as to have a little more strings to my ODI (Offensive Digital Investigations) bow.

Windows provides two utilities to manage such credentials, the graphical Credential Manager and the command line vaultcmd: you can see them in the next two pictures.

Credential Manager

When checking for credentials I strongly suggest to use both the manager and the vaultcmd since neither of these tools show all of them.

Microsoft writes here: "Credential Manager allows you to store credentials, such as user names and passwords that you use to log on to websites or other computers on a network. By storing your credentials, Windows can automatically log you on to websites or other computers. Credentials are saved in special folders on your computer called vaults. Windows and programs (such as web browsers) can securely give the credentials in the vaults to other computers and websites.".

From the previous statement we get one main lead, at least: credentials are stored in special folders. And this is exactly what I spot when monitoring and debugging my target systems: from a file system point of view, two/four different paths are used, not considering system's vaults. Inside the user profile you could find the Credentials and Vault folders into "<user>\AppData\(Local Roaming)\Microsoft\" paths, as shown in the following pictures.

Credentials folder
Vault folder

My journey started from these folders, but there is one major requirement before going on, that is DPAPI. As we'll see, credentials and vaults protection is based on this well-known Windows technology, so it's advisable to have an overall understanding of it: you could find many references online, and even on this blog (Happy DPAPI!). One key point descends from this fact, that is all the security around credentials/vaults is solely based on the user login password.

credentials files format

Inside Credentials folders you can find zero or more files without extensions and GUIDs as names. Each file contains one "secret", which is our target: the files are independent from each other. As highlighted in the picture, the file is simple a DPAPI blob with a small header in front of it: its decryption does not pose a new challenge.

Credentials file

vault files formats

Inside Vault folders you can find zero or more GUID-named sub-directories. Inside each sub-directory there could be three types of files: one Policy.vpol, one or more <guid>.vcrd and one or more <guid>.vsch. The vpol file contains the keys needed to decrypt vcrd files whereas vsch files contain the schema needed to correctly print out the decrypted secrets. Compared to credentials folders, there is an additional encryption level. Needless to say, no vcrd file in the folder means that no secret is to be decrypted.

The Policy.vpol file is a DPAPI blob with an header containing a textual description and a GUID that should match the Vault folder name. The gold mine is obviously the blob.


Once the blob is decrypted, we get two AES keys to be used in decrypting the vcrd files.

Policy.vpol decrypted

The vcrd files show a bit more complicated format: the core part is made up by a variable length array of encrypted attributes. These attributes are encrypted with the previous AES keys and the target secret is usually the last attribute, the one presenting an IV and the only one encrypted with the 256 bits key. The following pictures should help to understand the file format.

vcrd file, 1/2

vcrd file, 2/2

For the sake of completeness here is it a brief overview of a vsch file: I didn't go further in reverse engineering schema formats, too boring. The following pictures mainly show the link between the schema GUIDs, the vault directory names and the GUIDs inside the previous files.

vsch file

vaultcmd list schema


The previous sections were an harsh descriptions of formats without too many explanations, because I think it's better to introduce some tools to play with. You could start decrypting your credentials/vaults by using the Open Source Python code I wrote and I put into the dpapilab project on GitHub. The files related to Credentials/Vaults are four: vaultstruct.py, creddec.py, vaultdec.py and vaultschema.py.

The vaultstruct.py file contains the formats definitions, both for credentials and vaults: here you'll find pretty much all the intimate details, since the overall logic is limited. To decrypt credentials file use creddec.py and to decrypt vault files use vaultdec.py: in the next sections I'll provide some examples. The vaultschema.py file is an attempt to provide a better decrypted secrets output and it's based on my limited testbed and not, as I wrote, on a full schema reverse engineering. Something could fail, so feel free to open an issue and/or to modify the code to printout the decrypted hexadecimals. Needless to say the DPAPI kernel used by the scripts is the awesome dpapick: to use the previous scripts, first install dpapick, manually or by the Python pip utility.

The dpapilab code is not the only (Open Source) tool able to decrypt credentials/vault files: mimikatz can do it too. I'm not referring to the memory cached information, but to this command

mimikatz # dpapi::vault /cred:4EA276E76F13906751D56C95069FA718349F49EAE.vcrd /policy:Policy.vpol /masterkey:381a398a49c00b29cb1109802acc0fd1d379761

where masterkey is the SHA1 of the decrypted DPAPI master key material (the MK related to the policy file). I must do a digression right now.

<digression> Benjamin aka gentil_kiwi "fried" the novelty of my little research (open source novelty: there is at least one cool commercial tool doing credentials/vaults decryption since years). On 15 July 2015 he wrote the tweet "Full offline Credentials & Vault decryption with #mimikatz and keys (or online,as you want)" producing a bit of frustration in me... which fast disappeared when I reached him and we started to share candy vaults and to fight on decryption flows... we had a couple of funny hangouts, at least. Benjamin is a great security researcher but, above all, he is a really good guy (oh, well, even if evil_kiwi inside...)! Anyway I had a speedup on vrcd file format and moreover something to check results with. So... thank you and kudos! </digression>

Before going on with some examples, we have a strong requisite to decrypt credentials/vaults: to know the user login password. During the talk I provide a couple of well known shortcuts that could help in retrieving the password: anyway it should not be an easy task, unless... you have a memory dump. In this case you'll get many many chances to achieve the goal, thanks again to mimikatz. You could get the plaintext password, or its SHA1, or the cached DPAPI Master Keys, or even the cached credentials/vaults...  and so on.

decrypting credentials

Let's start with an example of  creddec.py usage. The black bold parameters are related to DPAPI, to get the proper master key material. The orange parameters are target credential files.

creddec in action

In the next figure we can see the beautified graphical result: creddec will not create such a nice result, just a textual one... well, I'm not an artist despite my surname.

credentials decrypted (some obfuscated...)

Please take into account that the decrypted file format was not a result of reverse engineering: I derived it using my limited testbed. So it could dramatically fail: don't blame on me, thank you.

I can testify the born of what I called double cheeseburger credential, starting from Windows 8 at least. Once this type of credential file is decrypted you get a special file format where you must put its pieces back together: then you will obtain... another DPAPI blob, this time protected by a system master key. That's the reason for its name. From creddec.py point of view you must provide the DPAPI system parameters too, as shown in the next picture in blue.

creddec vs double cheeseburger credential

This kind of credential is usually related to "WindowsLive:target=virtualapp/didlogical" data...

eating the double cheeseburger

decrypting vaults

I provide a single Vault example, but a juicy one. Windows introduced the possibility to login with a pin code or a picture shape, after having set a "normal" password: Windows 8-8.1 have a major issue, since both the pin/shape and the original password are kept in a system vault. This means no security. Credits go to Passcape for having discovered it. Let's see how to exploit this ugly thing.

vaultdec in action

Note that vaultdec requires the Vault directory as a parameter, not the single files as creddec. The result is shown in the next picture: again, the schema used to printout results is derived from data.

system vault decrypted


We have briefly seen how Windows Credentials and Vault files are secured and how we can decrypt them, with Open Source software. Given the proper legal authority, this possibility could be exploited to access user's protected data or simply to improve security awareness.

Since the same technology is used in different Windows versions, the research has a broad coverage: for example, in a next future post we'll see how to decrypt and use Windows phone ActiveSync tokens.

Wednesday, September 16, 2015

Rekalling Mimikatz

I'm not really sure that everybody knows that Rekall memory forensics framework contains a Mimikatz plugin: with this post I want to address this shortcoming, since the plugin has many good features and it can be easily extended.

behind the scenes

The act of rekall-ing Mimikatz started when I met Michael Cohen in Prague (SANS DFIR 2014) and a few months later in Dublin (DFRWS 2015). Despite the fact that I learnt so much by speaking with Michael, he deserves the credits to have pushed this plugin development: he released a first version on April 2015, based on what I did with Volatility (see et voilĂ  le mimikatz offline). So by hangout-ing during the night, we co-authored the actual Rekall mimikatz plugin: it was an awesome dive in Windows memory and Rekall internals, guided by Michael who truly has a talent for explaining complicated things in a simple way.

Before going further credits and thanks must go to the awesome reverse engineering research made by Benjamin Delpy: the plugin is based on the knowledge he shared and currently shares.

If you don’t know how (and why) Windows keeps in RAM system and users passwords or hashes, I provide some references: Cached and Stored Credentials Technical Overview; Credentials Processes in Windows Authentication; and, obviously, Gentil Kiwi blog.

debugging symbols power

When Michael showed me the plugin's first version I was amazed since it was really short compared with my Volatility version. Despite the fact my code was a PoC and not an example of well-written Python code for sure, this first version leveraged one of the Rekall key features: the capability to fetch and use in real time Windows debugging symbols.

In order to extract the right information from memory we need both the locations of globals variables and structs layouts. In my Volatility plugin I used byte patterns (provided by Benjamin in his code) to find variables' location in memory: given the fact that Microsoft provides the core symbols we need to achieve our goals (lsasrv!LogonSessionList; lsasrv!hAesKey; lsasrv!h3DesKey; wdigest!l_LogSessList; and so on...), Rekall simplifies  the first half of the job.

This is not the right place to fully address this capability, but once Rekall is instructed to create a profile for a given module (sys, exe or dll), it will extract its GUID from PE debug section and it will fetch from the Windows servers the right symbols by providing the GUID. From this point of view, the Rekall mimikatz plugin behaves like mimilib (the Mimikatz WinDBG extension).

In other words the plugin does not have to search structures in memory using the  byte patterns (or anchors) Benjamin provided: the resulting plugin code is smaller and re-usable, since all supported Windows versions use the same symbols names (more on this later), while anchors based on byte pattern tend to change more frequently. The fact that we don't need to worry about searching for byte patterns makes plugin writing much simpler.

Structs layouts had to be reversed from Mimikatz, without the need to hand code them in the plugin: there is an automated process to pull those from the Mimikatz source code, as explained in the next section.

callable vtypes

Another great capability of the Rekall framework is the vtypes struct definition language, which is not new (see "Rekall Profiles" and "Memory Forensics with Volatility"): vtypes indeed provide a fantastic way to write compact and readable code.  Let me try to show their power with an example in which we used a Mimikatz reversed structure in Rekall.

The widgest module is well-known for keeping in memory the user password: to get the credentials we used Benjamin's _KIWI_WDIGEST_LIST_ENTRY structure defined in kuhl_m_sekurlsa_wdigest.h.

We could have manually written the struct's vtype as it was done in the previous Volatility plugin. However, it is much simpler to automatically extract the vtype definitions from Mimikatz's own debugging information, since Rekall can already parse the PDB file format, it's enough to provide the Mimikatz PDB file to Rekall to get all the structures defined and referenced by the code in it: no manual work! Additionally we don’t have to clutter the source code with inline definitions of these structs - we can simply store the vtype definitions in the Rekall profile repository and fetch them on demand.

We compiled the Mimikatz code with PDB debugging symbols, then we used the parse_pdb Rekall command to get the json files with the structures we needed. Then we pushed the gzipped json file into the profile repository to have it easily available. By doing it for 32 and 64 bits architectures we are able to support both transparently, with really few lines in the plugin! In the next figure the _KIWI_WDIGEST_LIST_ENTRY resulting vtype is shown (64bits).

The previous definition explicitly defines Blink and Flink (backward and forward) pointers for the double linked list, but the Rekall framework has already implemented  re-usable code to parse this type of list: see _LIST_ENTRY class and ListMixIn class defined in "overlays\basic.py". Moreover the kiwi structures do not say anything about credentials.  We can extend the vtype by creating an overlay. The overlay "corrects" the generated vtype definition by overriding some fields and adding other fields:

With the previous (and incomplete) overlay, we declare the usage of _LIST_ENTRY (at offset 0) and adding the credential structure. The Cred field contains a _KIWI_GENERIC_PRIMARY_CREDENTIAL struct, located at an offset specified by a callable (which will be evaluated on access to the field) at  8 or 12 bytes (see later) following the end of the LocallyUniqueIdentifier field. With this simple action we are able to get all the list elements with these few lines of code:

That is amazing. The final wdigest code in the plugin becomes incredibly short, and  it's reusable for all Windows versions and architectures known so far!

Note that Rekall profiles are specific to a binary. So each DLL and executable refers to its own profile. It  is a good idea to avoid mixing multiple and possibly different vtype definitions coming from different PDB files. Another point is that we are not using all structures exported by Mimikatz, but only what we really need (see mimikatz_vtypes). Notice that we can simply get the exact address of the global constant "l_LogSessList" directly from debugging symbols, and therefore we do not need to scan for it:

  logons = self.get_constant_object('l_LogSessList', target=...)

before switching to features...

The previous two sections should not be considered a "how to write a Rekall plugin" but an aid to understand the plugin code and why I said it's easy to extend it. Obviously there is a bit more to do to achieve a fully functional plugin, but the Rekall capabilities helps a lot to have a top-down approach, to write less code and to improve readability.


The plugin implements the lsasrv module, which is mandatory to decrypt credentials, the wdigest and livessp modules: the last two SSPs are known to provide the users' passwords if enabled. So we should have good coverage for Windows 7 and 8. Windows 10 will be supported soon. Regarding Windows XP see below.

The plugin logic is the following:

  • switch to lsass process context
  • get crypto material, to be able to decrypt data
  • get all logons to the system and build a LUID dictionary
  • for each LUID get the primary credentials (aka LM, NTLM and SHA1 hashes)
  • if wdigest is used, for each LUID get the credentials 
  • if livessp is used, for each LUID get the credentials
  • for each LUID, get DPAPI master keys from lsasrv
  • render all the data obtained
  • (note: decryption occurs where needed)

To use it just type rekal.py -f Win7SP1x86.raw mimikatz. If you get some troubles, just add -v parameter to get some details: they are particularly useful when reporting issues. Plugin's results in the following (small) screenshot.

If widgest/livessp is disabled you will unfortunately not get any cleartext passwords from the module. Additional problems include the needed memory being paged out, thus preventing the plugin from achieve its result. You could avoid this problem by using Rekall to dump memory together with the pagefile (use the aff4acquire plugin).

Remember that for DPAPI decryption you only need the SHA1 hash (unless using live logins, for which I will write another post in the future) and that you could find the user's passwords or some hints by accessing lsasecrets and/or decrypting system DPAPI secretes (as WiFi passwords, for example). See my previous post Happy DPAPI.

deprecated xp support

The problem with past XP support was the different type of encryption, DESX, and how the OS was using the DESX key. I spent some time to develop a Python decryption class, which is included in Rekall: for the details see my post UnDesXing. So we have added support for Windows XP/2003! Note that I lacked XP x64 ram images, so in that cases something could go bad: please report and, even better, provide ram images.


I used the Rekall mimikatz plugin in different scenarios and I find it really useful, as much as I enjoyed writing it with Michael. The plugin's main goal is to get users' passwords, or SHA1 hashes to decrypt DPAPI at least. But it could be used from a “pure forensics” point of view, since it lists all logons to the system: actually timestamps are not reported, but it's a matter of adding a "print" statement. Or, when the proper kerberos module will be included, to detect an evil Mimikatz usage.

The plugin is included in Rekall, ready to be used! Just download the latest package from GitHub.

Tuesday, July 28, 2015

Windows Phone PIN cracking

Windows Phone 8 and greater allows the user to lock/unlock the phone by using a numeric PIN code: it's even possible to use a complex alphanumeric password. This post addresses how to obtain the simple numeric PIN code by cracking the authenticator kept in the SOFTWARE hive.

an useless quest?

Actually if you have a physical access to a Windows Phone you don't need the user pincode to examine the user data: with the proper hardware you can usually get a whole dump of the un-encrypted device memory. To my current knowledge the pincode is not used anywhere if not for device locking, so it's almost useless to know it. If the device is under a properly configured MDM, you could face a fully encrypted phone with TPM: in this case you'll have no chance to crack the pincode, even if more testing should be done.

This is exactly what I thought when my colleague Mattia Epifani tried to lure me with the Windows Phone PIN issue: he knows the curious monkey inside me... but I was a reluctant one. He then provided a couple of scenarios where knowing the pincode could be useful: in the end, I traded a couple of beers in return for the pincode cracker.

the starting lead and testbed

Mattia pointed out that the SOFTWARE hive registry key "\Microsoft\Comms\Security\DeviceLock\Object21" was related to the current pincode. This was the start of my travel. I had three physical dumps of three different Windows Phones: two with Windows 8.10 build 341 (label WPB_CXE_R1) and one with Windows 8.00 build 78 (label WP8_CXE_GDR2).

The next figure shows the Object21 registry key content coming from the 1st phone dump.

Indeed the pincode is composed by 5 numbers, but the most attractive value is for sure CredentialHash. In the next figure the bytes blob is reported.

Usually when I face an unknown blob of bytes I tried to figure out, with educated and cautious guesses, if there is a sort of schema: in other words if it represents a structured info. In this case it's quite easy to spot how that blob is organized. The first three DWORDs represent the length of the three following bytes array; the second one is the unicode string SHA256, and the last array has exactly the length of a SHA256 hash. Let me provide a better view of the data.

    80 00 00 00 0E 00 00 00 20 00 00 00 87 A6 A5 93
    5B 2D 8C 55 51 A1 20 07 50 3E A6 48 EB 63 5E CA
    36 9B 4D 5C 65 50 0B 5C 1A 1B E9 34 7B 64 A3 CF
    8B E2 A0 45 5E A0 C3 57 FD 3C 91 AE D8 9F 65 9C
    CE 02 B1 9E 75 06 C7 50 D1 A7 93 ED 76 04 FA 2E
    A4 0A 53 20 1B B1 FD 14 36 C2 2A A9 87 7B C9 BC
    C6 7B 7E 34 A1 EB 2F 6B 33 3A 81 51 99 31 B5 3D
    6F D2 1B 58 69 38 1F 45 5D E3 4B 51 18 36 27 2E
    65 36 3F BB 5B 6A 72 FD F0 D3 38 B7 53 00 48 00
    41 00 32 00 35 00 36 00 00 00 3C DA 9F 6D 42 E8
    83 50 83 4B B2 5E 20 73 7A 4D 66 78 95 01 D0 5A
    5D EA 20 BF 6B B5 53 F6 25 85

a bit of reversing

The question became "who is using that data and how?". Using the simplest approach aka old school effective string searching I got a couple of Windows DLL, SimplePinLap.dll and StrongPinLap.dll, whose names seem self-explanatory. I disclose that I'm really bad with ARM code reversing, but to be short, I spot some truly useful hints inside the SimplePinLap code, by looking for the usual BCryptHashData Win API function.

The 0x80 bytes array is a salt, pseudo-randomly generated. The 0x20 bytes array is the target hash, the one that must match if the inserted pincode is correct. I had some issues in understanding from the assembly how the inserted pin was used in the hashing process, but finally I got the following algorithm (easy, indeed).

HashAlgo(UTF-16-LE-NoTrailing0(pincode) + salt)

a bit of cracking

Finally the pincode seems not stored anywhere inside the system (not considering volatile artifacts): to get user's pincode you need to grab the SOFTWARE hive. If you own such file, you can use my cracking script winphonepincrk.py (link). Just provide it with the hive and all the magic will happen.

winphonepincrk.py --software=SOFTWARE

With this script I was able to crack the PIN codes coming out from the devices' dumps I have.

other stuffs

What abouf the StrongPinLap? A blind non-educated guess could state that the same algorithm is used, but the presence of a different library should suggest much more than that. I did not look the strong version: feel free to reach me with a message or a comment, in case you need extra info. Moreover I found some truly interesting stuffs that I will disclose in the next future, so stay tuned.


As I previously said, from my point of view getting the pincode it's an useless or unfeasible task: but I had some fun and I won a couple of beers. Mattia will offer a beer too to the first one who will crack the pincode stored in the blog post example, so hurry up (note: physical meeting needed)!

updates [07.august.2015]

  • Adrian Leong aka @Cheeky4n6Monkey correctly pointed out that the PIN hash data could reside in the Object31 key too. Moreover you could find the CredentialHash value in Object736, Object44... so, it's better to manually inspect all the keys in Microsoft\Comms\Security\DeviceLock. The reasons why the pin hashes can be saved around different keys are currently unknown, and, if you want to crack them, consider the fact that the CredentialActualLength value could be missing. In such cases you'll need to try different pin lengths before getting the result. Adrian provided a different script to crack the pin, which is more flexible but it requires (a bit) manually approach: you can find his script on https://github.com/cheeky4n6monkey. If you are interested in Windows Phone artifacts check Adrian awesome posts on his http://cheeky4n6monkey.blogspot.it/ blog.
  • Someone asked me if this cracking could be seen as a security vulnerability. No, it's not. You have to physically own the device to grab, when possible, a physical dump or the required files. Different cases, aka device pwned, are out of scope here. In any case a different "vulnerability" must be exploited before pin cracking, which is a quite strong authenticator in this context.
  • Finally if you want to explore Windows Phone from a DFIR point of view, Cindy Murphy, Mattia Epifani and me we'll be speaking on this topic at SANS DFIR Summit in Prague 2015.

Monday, June 22, 2015

A first look at Windows 10 prefetch files

Windows 10 prefetch files (*.pf) show a different file format compared to previous ones.  At first glance you'll spot no textual strings inside, and this was the initial reason that make me try to understand how they changed.

quick&dirty journey

I guess that neither you nor I will run into Windows 10 DFIR cases for a while. That's what I thought when Claudia Meda (@KlodiaMaida) contacted me, showing me a couple of Windows 10 prefetch files. She then provided me some interesting clues that tickled the curious george monkey in me. Officially I do not have spare time, since it's already allocated, so I illegally used the non-existent spare time of spare time: please don't betray me... so I hope you'll tolerate any shortcuts in my quick&dirty journey into the entrails of windows (disgusting, isn't it?).

first lead

First, what a nude prefetch file has to say? Check the first bytes in the next figure, which shows a prefetch file for calc... sorry, now it's calculator (sad, I'll miss you dear calc!). They should remind you some other "prefetch folder" related file: can't you find your memo?

All kidding aside, what the MAM signature recalls is also used by the SuperFetch file format, which on Windows 7 exhibits the very similar MEMO MEM0 signature. A great old (gosh, 2011!) post addressing the SuperFetch file format (and so MEM[0|O] format) is "Windows SuperFetch file format – partial specification" by ReWolf, a worth reading. From there, you can get that the MEM files are, in the first instance, compressed containers and that the Windows API in charge to decompress them is RtlDecompressBuffer.

an actionable lead

I launched windbg inside a Windows 10 virtualized guest (Pro Insider Preview 10.0.10074) and I put a couple of bp on RtlDecompressBuffer and RtlDecompressBufferEx functions: the target process is the SuperFetch process, the svchosted sysmain.dll. To be short, I landed on the moon, which in the case is the sysmain!SmDecompressBuffer method: in the next picture you can see some green boxes (I signed following the decompression branch) and some yellow boxes (checksum check, more on this later).

The routine core is represented in the next picture, where you can (could, if I correctly re-sized the image) easily spot the call to the target method RtlDecompressBufferEx.

When applied to our MAM case, in the end you get three bytes with the magic signature 0x4d4d41, one byte that identifies the compression algorithm used and, eventually, the presence of a checksum: the next 4 bytes are the uncompressed size of the original buffer, then if checksum is in place, you'll get 4 more bytes preceding after [errata 22.06.2015] the uncompressed size that contain the checksum. The remaining data is what must be decompressed with RtlDecompressBufferEx. Which algorithm in used?

The followings are the compression package types and procedures as they are in ntifs.h.

//  Compression package types and procedures.
#define COMPRESSION_FORMAT_NONE          (0x0000)   // winnt
#define COMPRESSION_FORMAT_DEFAULT       (0x0001)   // winnt
#define COMPRESSION_FORMAT_LZNT1         (0x0002)   // winnt
#define COMPRESSION_FORMAT_XPRESS        (0x0003)   // winnt
#define COMPRESSION_FORMAT_XPRESS_HUFF   (0x0004)   // winnt

#define COMPRESSION_ENGINE_STANDARD      (0x0000)   // winnt
#define COMPRESSION_ENGINE_MAXIMUM       (0x0100)   // winnt
#define COMPRESSION_ENGINE_HIBER         (0x0200)   // winnt

So considering that the MAM signature usually is followed by 0x4 (or 0x84), the algorithm is COMPRESSION_FORMAT_XPRESS_HUFF.


To replicate and double checking that findings, I created a small Python Windows native script: with native I pinpoint that you can't use on other OSes different from Windows, since it uses native api calls. Moreover you need Windows 8.1 at least, since the RtlDecompressBufferEx was introduced starting from that OS version. You could use the script to decompress prefetch files, if in need: but you'll get a better solutions at the end of the post. I tweeted about this script some days ago, pointing to a gist I made and that you can find at w10pfdecomp.py.


In the first instance I ignored the checksum assembly branch, but then I realized that SuperFetch Windows 10 files show the same MAM signature: by applying the previous script, decompression fails. Previously I introduced the fact that a checksum could be present in the prefetch file, when in the third byte (algorithm) you get the most significant bit set: in those cases you see 0x84 as the byte value.

Reconsidering the checksum branch, here is it what happens: the (prefetch|superfetch) file will have 4 more bytes set to the calculated checksum, those bytes stored after the decompresion size (so, bytes 8-11, starting to count from 0): that bytes must be skipped during the decompression phase.

The checksum is a simple CRC32, calculated on the whole file, zeroing out the current file checksum: you can then realize why in the dis-assembly  RtlComputeCrc32 is called three times. I'll updated my Python script to consider that checksum, both on the gist and on the hotoloti github repository.


No, I'm not drunk. yajul (it sounds nice) means Yet Another Joachim Uber Library. Joachim Metz published and currently maintains, among many others, the "Windows Prefetch File (PF) format" document, where he describes the various formats those file use: if you couple it with his "Windows SuperFetch database format", you'll get all the intimate details of Prefetch and SuperFetch files, compression containers included.

Moreover, his libssca (Prefetch files) and libagdb (SuperFetch files) libraries, with the help of libfwnt, are able to correctly handle the decompression and parsing of MAM compression containers (well, the libraries handles all the variants), and that is damned cool!

I want to personally thank Joachim for his prompt support when I reached him with my findings: among other things I got very good suggestions and observations on my short research. I want to share with you an interesting link he provided to me, link to a work made by Jeff Bush on Microsoft Compression Formats.


In the end we get that starting from Windows 10, Prefetch and SuperFetch files are compressed with the XPRESS HUFFMAN algorithm, actually a.k.a. the MAM format. Which is not new: Windows 8.1 uses it to compress SuperFetch files, but not Prefetch files. Moreover from what I see checksum is present only for SuperFetch files and never for Prefetch files.

It remains unclear why Prefetch and SuperFetch files are compressed. Usually compression means space saving (IO reduction?) and computational effort, but it could mean obfuscation too: if you have any clue, I'd be happy to get it.

Anyway with the excellent work made by Joachim we'll be able to understand and to handle those file without any problem. Last but not least, his work his Open Source: not bad, especially in the DFIR world, isn't it?

If you'd need my Python script you can download it from hotoloti or from the gist.