BlackEnergy V.2 — Full Driver Reverse Engineering
By Daniel Avinoam, Ben Korman and Aviv Shabtay
Introduction
BlackEnergy, a DDOS-causing malware, became infamous in 2008 when it was used in a cyber-attack launched against the country of Georgia as part of the Russo-Georgian War that year.
A GRU cyber-military unit named “Sandworm” was associated with the initial variant. As years went by, different versions were uploaded on underground forums. In this article, we will present the driver analysis of the second variation of the malware, released in 2010, starting from a memory image of a compromised system.
Before we dig in:
- During the analysis obvious actions will be made without being explicitly stated (function or variable name changes, selection of the relevant union, etc..).
- The complete vector is complex, containing a number of stages and components. This analysis will be focused solely on the kernel part of attack, as the memory analysis is well documented online.
- All scripts used, the examined driver itself, and the memory image analyzed can be found in this GitHub repo.
- We used Volatility 2.6 to analyze an infected memory sample that came along with the program, and IDA Pro 7.3 to reverse engineer the suspected modules dumped.
Memory Analysis
We start by executing basic kernel-space plugins like Callbacks, SSDT, and Modscan, to see if anything unusual pops out. From Callbacks, we detect that a driver with the suspicious name 00004A2A, is registered to receive an event from the OS on every thread created using the function PsSetCreateThreadNotifyRoutine:
Using the SSDT plugin, we revealed another table which the driver is registered to, making it even more suspicious:
In addition, the driver has no DeviceObject attached to it (no mention in the Devicetree plugin’s output) — this removes the ability of a usermode application to communicate with it.
Using the Driverirp plugin, we see another driver named icqogwp which has no corresponding file on disk. 3 of the driver’s dispatch functions are pointing to the same address in our suspected driver (Close, Create, and DeviceControl):
We will locate the base address of the first driver (00004A2A):
And extract it from the memory image:
Static Analysis Preparations
If we dump the driver to IDA, we will not be able to examine it. IDA will not recognize which API functions the driver is using and it will be challenging to understand the driver’s functionality due to its loading process.
Before handling the above issue, let’s start with rebasing the driver’s address space according to the earlier seen driver’s base address:
To fix the imports issue, we can use the Impscan plugin to extract the functions used by the driver during execution. We will create a Python script that converts the plugin’s output to an IDC script — which will then be loaded into IDA to reconstruct the IAT:
After running the IDC script, the function names appear in IDA as we identified before using the Impscan plugin:
Using the SSDT plugin earlier we saw fake SSDT dispatch function addresses. A similar script can be written to parse its output and update the IDA function names accordingly:
At first glance at IDA’s function window after running the above script, we see no change. This is because IDA could not locate functions in those addresses in the first place. In order to fix this, we need to access each function address and define it manually (by pressing P). We will get the following output:
Now we can begin the driver’s analysis.
DriverDispatch
Ideally we would want to start with the DriverEntry function. However, this function is corrupted in our memory extracted image and thus unparsable by IDA:
We will need to find a different starting point. Earlier, we observed 3 dispatch functions belonging to the icqogwp driver that are pointing to the same memory address (0xFF0D31D4) in our driver. Since it is a DriverDispatch function, we can tell its signature is as follows:
We will jump to it and set the input parameters to match the signature. We can use the Hex-Rays decompiler to ease the analysis (by pressing F5):
The function looks like an ordinary dispatch function that handles multiple request types. Let’s go through it:
In the case of an IRP_MJ_DEVICE_CONTROL request, the IOControlCode is checked (line 20). If the buffer received from the user is larger than 548 bytes, sub_FF0D3075 is called with the user’s buffer address and size (line 23) — otherwise an error value is returned. We will rename this function to DeviceControlDispatcher.
For the IRP_MJ_CREATE request, the driver returns STATUS_SUCCESS, and for IRP_MJ_CLOSE it releases a mutex that is being used throughout the code. The first parameter of the KeWaitForSingleObject function should be a KMUTANT (I.e. the mutex) — we will rename it as well.
DeviceControlDispatcher
Let’s go through the DeviceControlDispatcher function and fix its input parameters:
The function is long and includes multiple branches for various input buffers. Initially a call is made to sub_FF0D26D4 — which most likely resolves function addresses:
SUB_FF0D26D4
The function utilizes a helper function (sub_FF0D2797) which returns an object (v3). In order to understand what the object is, we can look at how it is used — a few hardcoded values are used with it, namely 0x3C and 0x78 which resemble known PE format constants (e_lfanew and the data directory array respectively). We can conclude that the function most likely returns a PE file pointer:
The two addresses sent to to the helper function reside in the driver’s address space and points to the strings “ntoskrnl.exe” and “hal.dll”:
Using this information, we finally conclude that the function gets a file name and returns its image address. Once we look into the function’s code, it appears our assumption was correct:
The function retrieves a list of all of the loaded modules using the QuerySystemInfomration API function (line 12) which returns an RTL_PROCESS_MODULES structure:
This structure contains a collection of RTL_PROCESS_MODULE_INFORMATION structures:
The function checks if the inputted string is ntoskrnl.exe (line 15), and since this module is always at the top of the collection, its image base is returned (line 17). In the case of any other input, the collection is traversed and the function looks for the module requested (line 29), when found its image base is returned (line 36) — we will rename the function to GetImageBase.
Now that we understand the helper function, let’s return to our original function (sub_FF0D26D4).
The function continues by parsing the PE file returned. As previously mentioned, the 0x3C offset represents the e_lfanew field of the file which contains the address to the IMAGE_NT_HEADERS structure:
The value 0x70 represents two values, 0x18 + 0x60, which together point to the DataDirectory array (0x60) inside the IMAGE_OPTIONAL_HEADER structure (0x18):
At this point we need to pay close attention to whether an address or value is being used — this can be challenging to do using the decompiler. Therfore, we will switch to IDA’s assembly view and work closely with the PE format and its data structures.
In both cases (ntoskrnl.exe / hal.dll), the EDX register stores the image base of the selected module and uses it for parsing. After going through the code we see the function searches the module’s export table and finds the addresses of the AddressOfNames, AddressOfFunctions and AddressOfOrdinals arrays:
Next, the function loops through the AddressOfNames array and compares the hash of each name (calculated via the function sub_FF0D26AD) with the second parameter passed to the function:
As long as the matched hash is not found, the loop continues. If the loop terminates with no success, an exception is raised.
sub_FF0D26AD looks like this:
As per our assumption, the function gets a name and computes its hash. That hash is later compared to a precomputed value, thus implementing the driver’s dynamic hidden function imports. To know the driver’s requested function, we will have to implement the hashing process, creating a hash dictionary of function names and addresses. For that, we execute the following steps:
- Dump ntoskrnl.exe from memory.
- Parse ntoskrnl.exe’s export directory and locate each of the module’s export function names.
- Calculate the hash according to the hash function used by the driver.
- Compare the results with the hardcoded hashes in the driver.
- Repeat the same steps with hal.dll (not shown).
We locate and extract ntoskrnl.exe from memory similar to how we extracted the driver.
The following script parses the export directory and saves each export function name:
Next, we will write a script which calculates the hash used by the driver:
In conclusion, the function sub_FF0D26D4 is used by the driver as a stealthier GetProcAddress — we will rename it to StealthierGetProcAddress accordingly.
From now on every time StealthierGetProcAddress is called, we will check BEHashCalc’s output to see which function the driver is using.
The following chart summarizes what we have seen so far:
Back to DeviceControlDispatcher
We can now return to the driver’s IRP_MJ_DEVICE_CONTROL handler function. At the beginning of the function ExAllocatePool is used with a hardcoded size as a parameter:
Here we encounter a problem — In most cases a driver and the modules communicating with it agree on the data structures used between them. Since these structures are unknown and assembled by the developer, we do not know which values reside in which offsets, their size, types, and usage. To figure out the unknown structures architecture, we will begin mapping them.
Back to the code — we see that at pBuffer+4 resides a value that determines a 9-case switch statement. This value is probably an enumeration, with one value per case. We will start mapping the structure sent to the driver (referred as “SystemBuffer” from now on):
At this point, we will go over each case.
Case 1
Leads directly to LABEL_4, there we see the following initializations:
It appears that the driver initializes the data in the new memory allocation (PoolAllocation) according to the SystemBuffer structure (pBuffer). The different offsets suggest we have two different data structures. Therefore, we will begin mapping the second structure as well (referred as “PoolAllocation” from now on).
The decompiler in this section seems to be misleading. pPoolAllocation_ + 2 actually corresponds to pPoolAllocation_ + 0x8 and when the structure is indexed (pPoolAllocation_[index]), it uses the index divided by 4. Though this seems wrong, it is actually the correct disassembly. This is due to the decompiler referring to the structure as an array of DWORDs, a 4 byte long type. This disassembly will appear throughout our analysis.
With another look at the function, we infer that the first member of the PoolAllocation structure is the BufferCode (red arrows) and that LABEL_12 frees the allocation and exits the switch. From the code flow, it looks like a cleanup in case of an error:
Returning to Case 1, after the initializations in LABEL_4 there is a jump to LABEL_10 followed by a check to the value at SystemBuffer + 0x8. If the condition is TRUE, sub_FF0D3592 is called. The exit status is then returned to the user at the first value in the SystemBuffer structure.
Diving into sub_FF0D3592, we see the first use of the mutex we saw earlier released in the IRP_MJ_CLOSE handler. After it is acquired, sub_FF0D3329 is called:
Typically a driver will use a mutex to synchronize the access to a modifiable shared resource (usually a list) between multiple threads. The usage of the two global variables (dword_FF0D53D0 and dword_FF0D53CC) and the red highlighted block seems to contain usage of the LIST_ENTRY structure — a structure which connects lists in the kernel:
With this assumption in mind, we can infer that the PoolAllocation gets added after the dword_FF0D53D0 variable (yellow-highlighted) in the list — meaning this variable points to the list’s tail.
We also suspect that the second variable (dword_FF0D53CC) points to the head of the list (we again infer this through the sequence of instructions — in case the list does not have a tail defined, define the entry as its head. In any case the entry will be at the head of the list after these instructions finish executing).
In order to validate our suspicion, we will look at sub_FF03329:
The function iterates through a collection (that we assumed to be a linked-list that starts with dword_FF0D53CC). In every iteration, it compares the values from each list entry (i) to the input parameter (a1):
With our newfound knowledge, we can infer two important things:
1. The list consists of PoolAllocation structures.
2. The Flink and Blink are found in offsets 0x24 and 0x28 inside the PoolAllocation structure respectively:
Through sub_FF0D3329’s XRefs, we see that it gets called by all of the fake SSDT fuctions:
Taking a look at those fake function might help us understand our function’s purpose. It seems the function determines the fake SSDT functions’ return values — if our function returns TRUE, an error code is received from the SSDT function called. Otherwise, the real function will be called with the requested parameters. Notice that every fake SSDT function builds a structure using the received parameters and sends that structure to our function as an argument:
Hooking the SSDT allows a malicious program to determine the user’s returned values from core API calls, enabling it to conceal its actions. From the way the linked list is utilized and from the fake SSDT functions’ implementations we can deduce the driver maintains a list of metadata on its assets and resources that should be concealed. When a call to an SSDT function is made with an input parameters that refers to one of the aforementioned resources, the driver ensures they are kept concealed.
If our assumption is accurate, before the call to sub_FF0D3329, a PoolAllocation structure is assembled and sent to the function as a parameter (since we know our list is built of these structures and the function compares the input to each list entry). In each of the fake SSDT functions we need to follow the structure assembly in order to resolve its layout.
Similar to DeviceControlDispatcher, the first member in each structure instance is a buffer-code varying between each SSDT function type (depending on what the function relates to):
· CODE=1: functions relating to PIDs. When these functions are called, the relating PID is set at offset 0x8 in the structure, so we assume the offset is used to store a PID:
· CODE=2–4: functions relating to string comparisons:
Notice that the parameter passed to sub_FF0D3329 is the address (pointer) of var_4C (v5 in the decompiler):
Since the stack should contain a PoolAllocation structure, by looking at the string location relative to var_4C we can construe its offset:
Var_4C is at offset -4C (hence its name) and DestinationString is at -3C — therefore the string is at offset 0x10 in the structure.
Additionally, IDA detected that DestinationString is a UNICODE_STRING structure:
Note: IDA can address the function’s variables using the EBP or ESP registers as an index. Therefore, the variable values assigned by the disassembler will differ in each case. If EBP is used (like in our case) the values will be negative — otherwise, positive. The signs change according to the location of the variables on the stack relative to the two registers. Either way, the difference between their values will be the same.
· CODE=7: functions that relate to memory reading. The memory address to read from is set at offset 0x18, and the amount of bytes to read is at 0x1C:
In addition, a helper function is used to convert the process handle to a PID. The variable assigned with the PID (v7) is located after BytesToRead on the stack, at offset 0x20:
Although the decompiler does not show the PID getting defined inside v6 (i.e. the PoolAllocation structure), by its initialization and its location on the stack, we suspect that the variable is part of the structure.
· CODE=8: functions relating to threads. Once again, the PID is set at offset 0x8 (as in CODE=1), and 0xC holds the TID:
· CODE=5, 6 and 9 are not used in any fake function.
After mapping out the values discovered in the fake functions, our current PoolAllocation structure looks as such:
sub_FF0D3329
Let’s return to sub_FF0D3329. Using the mapped structure, we can now understand the function’s inner workings better. Renaming all of the variable names according to the structure offsets we figured out, it is clear the function receives a list entry and checks whether it is present in the shared list:
When BufferCode=2–4, the input should be two entries containing strings which are both sent to a helper function (sub_FF0D329F) as parameters. A quick glace unveils the function is comparing them:
When BufferCode=8, the PIDs and TIDs are compared between the two entries:
When BufferCode=7 there’s an initial check whether the PIDs are equal followed by another check whether the requested address space contains the malware’s memory:
To conclude, function sub_FF0D3329 recieves a PoolAllocation structure and checks whether it is in the shared list. If it is, the structure is returned. We will reference the function as CheckIfObjectInList from now on:
Now we can return to the function sub_FF0D3592 from Case 1 in DeviceControlDispatcher:
Here we can also see a use of the helper function sub_FF0D340A. When we step into it we see a loop that sums up the total size of all the objects in the list:
Next, using the sum, an equally sized memory chunk is allocated and another loop runs through the list — this time each list entry is copied into the new allocation:
At line 62, we see the first and only use of the value in offset 0x4 inside the PoolAllocation structure. The value represents a flag that determines if the list entry gets copied into the new allocation.
When the loop terminates, the function sets the memory allocation as a registry value to a key named “RulesData”:
We found out that the driver saves its shared list in the registry and updates it whenever a new entry gets added. We’ll rename the functions accordingly:
· sub_FF0D3592 — AddObjectToList
· sub_FF0D340A — UpdateListInRegistry
Back to Case 1 in DeviceControlDispatcher:
At line 35, we see an if statement. If the condition is true, the object received from the user gets inserted into the list and an exit status is returned. Otherwise, we enter the function sub_FF0D3620, which looks for the object in the list, removes it and updates the registry value:
In case 1 the driver gets a PoolAllocation structure where CODE=1 and inserts or removes it from the shared list:
Cases 2–4
we already know the BufferCode received from the user, which determines the switch statement result, gets copied to the first value in the PoolAllocation — meaning this is a case of a string-contained structure as well.
The string gets copied from the UserBuffer to the new PoolAllocation. The structure is then inserted or removed from the list:
We will update our SystemBuffer struct with the new values we found at their appropriate offsets:
Case 5
This case is very similar to case 1 which uses a PoolAllocation where BufferCode=1 (i.e. a process-related entry) except there is a call to the function sub_FF0D2EE3 prior:
sub_FF0D2EE3:
The function acquires the EPROCESS pointer using the process’s PID (line 13) and then increment it by the dword_FF0D5330 value, saving the result in the variable v2 (line 15).
dword_FF0D5330 equals to 0x88:
At EPROCESS + 0x88 we see a LIST_ENTRY structure that connects all the other kernel’s EPROCESS structures:
At lines 16–17 the function removes the EPROCESS from the list.
In Case 5, the driver is given a PoolAllocation structure containing a PID, and in addition to adding it to the shared list, it removes its corresponding EPROCESS structure from the kernel’s process list.
Cases 7–8
In both cases the relevant values are copied to PoolAllocation which is then inserted or removed from the list:
We will again update our SystemBuffer structure. Notice the 500 undefined bytes between “String Maximum Length” and “Address to Read From” — this is where the raw string will probably reside:
Case 9
Starts with a call to sub_FF0D302B, followed by freeing the memory allocation where PoolAllocation resides:
Stepping into sub_FF0D302B, we first see a call to the parameters-free function sub_FF0D36F1 (line 9), next the registry key gets deleted and the function sub_FF0D13ED gets called with the global variable dword_FF0D53A4 as input (line 23):
sub_FF0D36F1 frees the shared list:
At function sub_FF0D13ED we see the creation of an ObjectAttributes structure where the function’s argument is assigned as the ObjectName (line 14). Finally, a file is created using the structure:
The function assigns the value 0x240 (OBJ_KERNEL_HANDLE) to the attributes field of the structure (line 13) which according to MSDN:
At the bottom line, the function creates a kernel-only accessible file, seemingly to mark the system as infected and prevent a second infection. Case 9 basically removes the malware from the system without leaving any trace:
Case 6
we saved the best for last. At line 56 the driver checks whether another structure exists in memory after SystemBuffer by comparing the SystemBuffer’s size field (at offset 0x220) with the entire user’s buffer size. If the SystemBuffer’s size is smaller (i.e. another structure exists), the mysterious data is then sent to the function sub_FF0D29F4:
sub_FF0D29F4 is a complete mess:
Similar to StealthierGetProcAddress, here we also see the unknown structure is parsed using known offsets in the PE format (0x3C). After checking the rest of the offsets we see that they all match the format as well, meaning this structure is probably a PE file.
Firstly, the function allocates memory with the size equal to the PE file size, and then it copies the PE headers and sections into it:
Next, the functions sub_FF0D2944 and sub_FF0D28B3 are called. Before the two calls there is a check whether the relocation directory table and import directory table exists in that order:
By simply glimpsing at both of the functions’ parameters we can assume their purpose. The first function (sub_FF0D2944) runs through the relocation table and updates every pointer in the PE to its new address relative to the allocation base address (we will not go into detail):
The second function (sub_FF0D28B) runs through the PE’s import table:
GetBaseAddress (line 14) will get the name of each module in the table (first value of each entry):
Each function address the PE imports will then be sent to the helper function sub_FF0D2824 along side the current base address of its module.
The helper function will return the function address relative to its module base address, similar to GetProcAddress (we again will not go into detail):
Finally, the function sub_FF0D28B updates the new PE’s import address table with the addresses it gets from the GetProcAddress calls.
After the pointers in both tables are updated, the first function in the PE’s export directory table is called:
Since the function updates the PE pointers relative to a kernel pool allocation address, we know the PE file is a driver. The first function address in a driver’s export directory table points to a DriverEntry function. Its signature looks like this:
After the DriverEntry call, the function looks for the PE’s relocation table and zeros it out:
We discovered that in Case 6 the driver loads another driver reflectively:
According to the DriverEntry signature, the first parameter is a DriverObject pointer. The first parameter sent to the DriverEntry from our memory image (dword_FF0D52B0) points to the address 0xFF366550:
When looking at this address in Volshell we will see the DriverObject of the suspicious driver we saw in memory — icqogwp:
ThreadCreationCallback
Earlier in the analysis, we saw using Volatility’s Callbacks plugin that the driver 00004A2A sets a callback function (sub_FF0D2EA7) to thread creation notifications. This function’s signature should look as such:
At first, the helper function sub_FF0D2E1A is called with the TID and PID:
Using the PID, the helper function gets the process’ EPROCESS. Afterwards, the value at the offset EPROCESS + dword_FF0D5340 is put into v4 when dword_FF0D5340 equals 0x190:
The LIST_ENTRY object located at offset EPROCESS+0x190 connects the process’ threads where every thread is represented by an ETHREAD object:
We can infer that v4 contains the address of the next FLink (the first ETHREAD) and v5 contains the second ETHREAD’s address (the FLink of the first FLink). In the case there exists more than one thread, the driver increments the IRQL by one:
Since we want to work with an ETHREAD pointer and not with a LIST_ENTRY one, we will need to perform a mathematical operation on v5. This is what the CONTAINING_RECORD macro is for:
Next, the function compares the input TID (i.e. the newly created thread ID) with what is located in ETHREAD + 0x1EC + 4 (line 36 after simplification). This offset in the ETHREAD structure stores the thread’s ID:
The function returns the pointer to the created thread’s ETHREAD structure (v8).
Returning to sub_FF0D2EA7, we see a check whether the thread is being created or closed (the Create flag at line 9):
In case the thread is created, the function replaces the address in ETHREAD + 0xE0 which points to the new thread’s ServiceTable (i.e. the pointer to the new thread’s SSDT):
From this it can be assumed that the global variable which the function replaces the ServiceTable pointer with (dword_FF0D5398) points to the malware’s fake SSDT.
Conclusion
An operating system’s memory image is strong evidence that can give us in realtime, a complete attack vector analysis capability with fast response time. On the other hand, in some cases this can possibly not be enough, and we will need to reverse engineer dumped files to get a bigger idea on what is going on.
In this article we tried to show you the basic steps to perform when detecting a suspicious driver in memory: from collecting evidence from the memory file (shown partially), to dumping and rebasing the driver’s address space, detecting data concealment, simplifying the disassembly, and finally to fully understand its main mechanisms.
BlackEnergy used a monitoring driver that kept its activities hidden and used as a reflective loader to kernel memory — allowing the attacker to bolster its footing on the system and expand its toolkit with little effort.
The analysis we did emphasized the driver internal design in order to understand its components and included function tracking and information cross-referencing, which in turn helped us assemble the structures that used the driver and other malicious modules:
You are welcome to continue the analysis from where we have stopped (icqogwp etc..) and see how the rest of the attack vector’s components use the driver’s capabilities, what it meant to hide and how it got to the system.
Analysis summary chart:
Sources:
· https://www.amazon.com/Windows-Kernel-Programming-Pavel-Yosifovich/dp/1977593372
· https://docs.microsoft.com/en-us/windows-hardware/drivers/
· https://www.vergiliusproject.com/
· https://www.codeproject.com/Articles/800404/Understanding-LIST-ENTRY-Lists-and-Its-Importance
· https://www.onlinewebfonts.com/
· https://www.geoffchappell.com/
Special thanks to Noam Nagar for the help with the English translation.