[0x0] Summary

This week i enjoyed solving a reverse engineering challenge proposed by 0verfl0w in the context of the amazing malware analysis course zero2auto. The main task was to reverse engineer the string decryption routine used by Gozi. I really liked this challenge because it improve my technical and methodological skills, without too much drama here is my write up ☺️

[0x1] Unpacking the first stage

When we start analyzing malware, one of the earlier verifications is to check if the sample that we are dealing with is packed .. the idea behind this protection method is to hide the “final code” that will cause the real damage to the infected device and reconstruct it at run time since most of antivirus programs are well trained to recognize strong patterns that characterize each malware family thanks to threat intelligence people who spend their time tracking differents malware and trust me it’s really rare nowdays to find a “naked” malware whose code appears as clear text. In practice, the most important indicators of packers are:

  • High entropy: the bad code resides inside the original PE but in a “encrypted” format and this increases significantly the entropy, the figure (1) shows that the entropy of the sample is high (almost 7.239).
proc_auxv
  • List of imported API(s): the unpacker stub code which is responsible for rebuilding the final stage and transferring control to it uses mostly win32 API(s) related to memory allocation, memory permissions and resources manipulation such VirtualAlloc, VirtualProtect, loadresource etc. i guess we expect to see this list of functions in the IAT, but currently malware authors avoid to import them directly because it will trigger the attention of the AV heuristics engines instead they fill the import table with whatever they like as you can see in the second figure.
proc_auxv
  • Abnormal PE sections: Examining the state of existing sections can lead to find anomalies for example you can see in figure (3) that the section .rdata contains a valid assembly code which is suspicious because this section is reserved for constant data only that’s why the dafault permission is “R”, maybe it’s a junk code that will never be called or we can suppose also that this permission will be changed to “RX” at some stage of execution and some interesting things will happen later. What i presented here is just a small example, the expression “Abnormal PE sections” involves many other case: Presence of data sections with large sizes, Presence of a “non-standard” section (for example UPX packer create a special section named “UPX”) etc.
proc_auxv

To unpack manually a malware, basically the classic strategy relies on tracking allocations that will be created, it suppose that one of these allocations will be filled with the final stage and hunt later for patterns like “MZ” that indicate the start of a valid PE. In other words, the unpacker will not be disturbed, it will run freely until the malicious code will be clearly reconstructed in the memory. For that purpose, we have to set up software breakpoints in the following functions: VirtualAlloc, VirtualProtect,WriteProcessMemory, CreateThread, CreateProcessA, ResumeThread. However this time I chose to work in a different way, we won’t track any memory allocation instead we will execute the malware and wait for a special event which characterizes the final stage of gozi based on the behaviour observed by a trusted Sandbox (i prefer to work personally with capesandbox, it logs all interesting Win32 API(s) that were called during execution of the malware), this event may be a communication with a C2 server or the creation of a special mutex or some sort of process injection, after that we will determine from where this event comes by analyzing the call stack. In the case of gozi, I found that it establishes an http connection with the C2 185.189.151.28 using functions provided by internet library Wininet.dll as you can see in the following figure:

proc_auxv

let’s put a DLL breakpoint on the module Wininet.dll after attaching the malware to x64dbg debugger and specifying this command line: “rundll.exe /Path/To/StageZeroGozi.dll,#1”, we will get a hit when this particular dll will be loaded into memory:

proc_auxv

Now we will run the first stage until the hit of the previous breakpoint just after do this: go to symbols tab, choose Wininet.dll and put a software breakpoint on HttpOpenRequestA and finally click on the button “run” to continue the execution:

proc_auxv

Few seconds later the debugger will catch the start of execution of the HttpOpenRequestA API: eip register points to wininet.HttpOpenRequestA and the appropriate arguments are pushed into the stack, okay now we are running code in the context of the “real” gozi!! let’s have a look into the call stack to determine the portion of code responsible of calling HttpOpenRequestA: go to call stack tab it shows you the calling order in other words “who is calling who”, you can see that call dword ptr ds:[<&HttpOpenRequestA>] was executed at the address 0x1E7BDD

proc_auxv

if we follow in memory map this address we will find that it is located in a memory region of size 0xD000 which has 0x001E0000 as the starting address, this region is the mapped format of gozi malware (just check to the layout of each section it’s size is a multiple of 0x1000), we should dump it now to the disk and realign it since static analysis tools expect the unmapped format

proc_auxv

[0x2] Hunting of the decryption routine

Now we have the unpacked gozi in our hands, it’s time to upload it to ida pro and keeping an eye on x64dbg at the same time for a deep investigation. The important questions that we should answer to resolve this challenge are : “where the encrypted data resides exactly?” and “what is the portion of code responsible of processing this encrypted data at run time?” let’s go back to the step where the malware called “InternetOpenA” to Initializes the use of the WinINet functions to start talking with the C2 server (just add a software breakpoint that points to this function and re run the program until it reaches it) the x64 debugger shows that the first argument “lpszAgent” is a pointer to “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)” that resides in a dynamic memory region ok ok hold on for a second from where the string comes from, that looks interesting maybe this string is a part of list objects encrypted by the malware:

proc_auxv

if you go to ida pro you will be sure that “InternetOpenA” is called once inside the function sub_10007AF1, by analyzing the Code Cross-references and the arguments passed : sub_1000375F –> sub_10006954 –> sub_10007AF1, it’s clear that dword_1000A368 points to “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)” this variable is initialized by a pointer returned by the function HeapAlloc inside the function sub_10003D2C after that the content pointed by dword_1000A368 will be filled with the content of the buffer pointed by ((*dword_1000A348) + byte_1000B552) using sprintf: proc_auxv

if we repeat the same process with other clear strings we will notice a repetitive pattern: all clear strings associated to gozi malware were referenced by a pointer that takes this form: (*dword_1000A348 + PtrToEncryptedVariableLocatedInBssSegment), the first component points to the start of memory where the decrypted strings are stored and the second component points to the clear string within that memory , from here i suspect that the malware author choose probably to create a copy of the BSS and decrypt it at once. Finding where the “base address” of the decrypted area takes its value is the key to find the decryption routine because the operation of initialization of this variable means that the decryption routine has finished and the malware can use this variable to locate its strings so we are not far from this routine: xrefs to this variable shows that it is written for a single time in the function sub_00245D2E:

proc_auxv

Inside the previous function a lot of important work takes place. First the malware get the address and the size of the BSS segment and store these values in the local variables [ebp-0xC] and [ebp-0x4] by calling the function sub_100047C, in summary this one do the following:

  • Receive a pointer to the IMAGE_DOS_HEADER then it extract the field e_lfanew which is the file address of new exe header (1)

  • Get the number of existing sections, this information can be found in IMAGE_FILE_HEADER.NumberOfSections (2)

  • Calculate the address of the section headers table (3)

  • Iterate through the sections headers table to locate the entry associated to the segment “BSS” (4)

  • Extract the relative virtual address and the the virtual size of the “BSS” segment and store these values inside the variables pointed by a3 and a4 (5)

int __fastcall sub_100047C8(int a1, int IMAGE_DOS_HEADER, _DWORD *a3, _DWORD *a4)
{
  int v4; // ecx
  int v5; // esi
  int result; // eax
  _DWORD *v7; // ecx
  _DWORD *v8; // edx
  int v9; // ecx
  int v10; // edx

  v4 = IMAGE_DOS_HEADER + *(_DWORD *)(IMAGE_DOS_HEADER + 0x3C);// the variable v4 will receive the address of the IMAGE_FILE_HEADER (1)
  v5 = *(unsigned __int16 *)(v4 + 6);// the variable v5 will receive the number of the existing sections (2)
  result = 0;
  v7 = (_DWORD *)(*(unsigned __int16 *)(v4 + 20) + v4 + 24);// v7 will points to the "sections headers" array because (v4+24) points to the start of the optional header and *(unsigned_int16*)(v4 + 20) is the size of the optional header (3)
  v8 = 0;
  do
  {
    if ( !v7[1] && *v7 == 'ssb.' ) //compare the attribut "Name" of the structure IMAGE_SECTION_HEADER with "bss" string (4)
      v8 = v7;
    v7 += 0xA; //increment the pointer v7 by 0xA to process the next entry of the secion table
    --v5;
  }
  while ( v5 && !v8 );
  if ( !v8 )
    return 2;
  v9 = v8[3];
  if ( v9 )
  {
    v10 = v8[4];
    if ( v10 )
    {
      *a3 = v9; //store the relative virtual address value inside the variable pointed by a3  (5)
      *a4 = v10; //store the virtual size of the segment inside the variable pointed by a4 
      return result;
    }
  }
  return 11;
}

The next step is to create a buffer that has the same size as the BSS segment and copy its content into that buffer using memcpy. Right after that, a sort of “key” will be constructed by performing simple arithmetic operations between multiple components: first the numerical value equivalent to “Apr " will be added to to the numerical value equivalent to “26 2” (1) then the result will be added to [ebp - 0xC] (2) which contains the relative virtual address of “BSS” and the final value will be added to (argument2 -1) where argument2 is the second argument of the function sub_10005D2E, this key will be pushed into the stack along with other parameters to call sub_1000138A, one of these parameters is a pointer to the region that contains the copy of BSS segment we can now assume with high confidence that sub_1000138A is the decryption routine that we are looking for. Please note that the argument 2 is not a constant, it is a random value that can take any value in [0:0x13-1] because its value is based on the result of “GetSystemTimeAsFileTime” that’s why the malware execute perform some form of “brute force” by executing the decryption routine for several times with different values of arugment2 until it find the right key.

proc_auxv

[0x3] Inside the decryption routine

The algorithm used is simple,recursive and exclusive which doesn’t rely on a well known algorithm like CR4 etc, each block of 4 bytes is decrypted using the formula: DecryptedChunk[n] = EncryptedChunk[n-1] - Key + EncryptedChunk[n]. Note that the random value (remember argument 2 of the function sub_00245D2E) that will be used to construct the correct key is : 19, i executed a brute force process in my local computer to determine it.

proc_auxv

We can implement an emulation of this routine in python to decrypt the BSS segment:


import pefile

import struct

import argparse

def main(Path):

    Gozi = pefile.PE(Path)

    DecryptedData = b''

    Key = struct.unpack("<I",b"Apr ")[0] + struct.unpack("<I",b"26 2")[0] + 0x0000B000 + (19) - 1

    print("key string is " + "" + hex(Key))

    for section in Gozi.sections:
    
    #check if it's the BSS section
      if(section.Name.decode().rstrip('\x00') == ".bss"):

        #read the content of the section

        EncryptedData = section.get_data()

        #start the decryption process: in each iteration a block of 4bytes  will be decrypted

        PreviousEncryptedChunk = 0x0

        counter = 0x0
        
        i = 0x0

        while( counter < 1024):

            DecryptedChunk = (PreviousEncryptedChunk - Key + struct.unpack("<I",EncryptedData[i:i+4])[0]) & (pow(2,32) -1)

            PreviousEncryptedChunk = struct.unpack("<I",EncryptedData[i:i+4])[0]

            i = i + 0x4
            
            counter = counter + 1

            DecryptedData = DecryptedData + (struct.pack("<I",DecryptedChunk))

        
        
        print(str(DecryptedData))

if __name__=="__main__":

    parser = argparse.ArgumentParser(description='enter the full path of gozi malware to decrypt its string')
    
    parser.add_argument('-P','--Path', help='Gozi malware Path', required=True)

    args = vars(parser.parse_args())

    main(args["Path"])

and here is our beautiful list of decrypted channels:

proc_auxv

[0x4] A Little Bonus

You remember when i said that malware researchers “track” and “monitor” malware families continuously? one of their main goal is to look for some specific patterns by studying in depth the anatomy of the malware, you can imagine it as a strong fingerprint or “signature”. In the case of gozi family, the aglorithm used for decryption is “custom” and “exclusive” which means that the probability of finding another executable that contains the same code is logically low also the way the malware parse itself to extract the relative virtual address and the virtual size of the BSS section looks custom too, so we can use these findings to build a basic detection/classification rule using Yara tool:

import "pe"

rule Gozi {

    meta:
        description = "Gozi malware family"

        author = "TidNdader"

        date = "22/10/2022"

        state = "experimental"

    strings:

        $decryption_routine = { 53 C1 E8 02 33 DB 85 C0 74 2F 56 8B 74 24 0C 57 2B F2 83 7C 24 18 00 8B 0C 16 8B F9 74 09 85 C9 75 05 33 C0 40 EB 0D 2B 5C 24 14 03 CB 89 0A 8B DF 83 C2 04 48 75 DB 5F 5E 5B C2 0C 00 }

        $ExtractBssFeatures = { 8B 4A 3C 03 CA 0F B7 51 14 56 0F B7 71 06 33 C0 8D 4C 0A 18 33 D2 39 41 04 75 0A 81 39 2E 62 73 73 75 02 8B D1 83 C1 28 4E 74 04 3B D0 74 E7 3B D0 74 20 8B 4A 0C 3B C8 74 15 8B 52 10 3B D0 74 0E 8B 74 24 08 89 0E 8B 4C 24 0C 89 11 EB 07 6A 0B EB 02 6A 02 58 5E C2 08 00 }

    condition: (pe.is_pe and $decryption_routine and $ExtractBssFeatures)      
}

[0x5] Index

  • the md5 hash of the sample to analyze : f28f39ada498d66c378fd59227e0f215

  • Download Link : malwarebazar