Hancitor infection mechanics and network traffic decoding

In this blog post I’ll show some analysis I did couple of months ago of a randomly picked Hancitor malware from the malware-traffic-analysis.net web site. The link to the PCAP can be found here.

This article was originally posted here

The purpose of this analysis is to understand the infection mechanics and make sense of the traffic generated by the malware.

Looking at the PCAP file’s network connection statistics we can see there are 97 TCP and 9 UDP connections in total.

The UDP traffic is only domain name resolutions for some interesting web-sites.

The malware and its components seem to have resolved the following domain name addresses:

  • christs-ministries.com
  • api.ipify.org
  • gosandhegly.com
  • mail.voicesinprintpublishing.com
  • neubacher.at
  • waslohidi.ru
  • www.google.com

Now let’s focus on the TCP connections by following the first stream in the PCAP file which shows us the traffic when a user clicks on the link in the phishing e-mail.

The full HTTP hyperlink is trimmed from the PCAP but it’s not that important for our analysis. We can see that clicking in the link the victim downloads an MS Word document “bofa_payment_167492.doc”

The next stream shows us the malware obtaining the externally visible IP address of the victim through the api.ipify.org web service.

In the next stream we see the malware making an HTTP POST request against “gosandhegly.com/ls5/forum.php” submitting in clear text system information data from the sandbox machine.

As a response to that POST request, the server sends back Base64 encoded data to the victim’s machine. As it turns out the “43c” in the returned data is the hexadecimal representation of the length of the Base64 string. The zero character at the end is likely marking the end of the encoded string. We can safely ignore these and just extract the Base64 encoded data for now.

Unfortunately running the data through a Base64 decoding routine doesn’t seem to produce any clear text or meaningful data so we’ll leave it for now.

The next network stream is a GET request towards the “mail.voicesinprintpublishing.com” web-site but the returned by the server data is still meaningless as it’s obfuscated in some way.

The following few streams seem to contain obfuscated data so we can’t tell what is being communicated.

In following communication with the gosandhegly.com C2 server, the returned data again looks Base64 encoded, but is rather small in size.

It decodes to something meaningless so we assume there is a second layer of obfuscation applied against this data.

We won’t be able to decrypt the traffic towards waslohidi.ru without the server’s private key as it is TLS encrypted.

At this point in time, we could not extract much information from the PCAP file so we’ll focus on the document file that gets downloaded when the user clicks on the phishing URL.

We open the document in a safe environment and see it’s asking the user to enable its embedded VB Macro code.

Upon enabling Macros, Word will automatically execute the Document_Open() function which is the entry point to the malicious code.

We see that the code will transfer control to the “eyesonly” function, but let’s look around the code first. We see that there are some interesting alias assignments in the “ahtungs”, “barbarian” and “foxitr” modules of the VB code.

The “tace”, “awakened” and “condole” aliases are assigned to the NtWriteVirtualMemory, NtAllocateVirtualMemory and CreateTimeQueueTimer windows functions accordingly.

The CreateTimeQueueTimer (condole) function is interesting because the third parameter passed to it is a pointer to a callback function which will be executed when the timer expires. Combining this functionality with the other two functions together could be associated with code injection. That is why we set breakpoints in the VB code where they are called and allow the VB Macro code to run.

The first hit is for tace (NtWriteVirtualMemory) but it doesn’t have much to do with the code injection itself. Next hit is for “awakened” – NtAllocateVirtualMemory which allocates some memory region within the address space of WINWORD.exe with Read/Write/Execute permissions.

awakened -> NtAllocateVirtualMemory(
    IN HANDLE ProcessHandle,     = -1 (self)
    IN OUT PVOID *BaseAddress,   = 0
    IN ULONG ZeroBits,           = 0
    IN OUT PULONG RegionSize,    = 9352 (0x2488)
    IN ULONG AllocationType,     = 4096 (0x1000) = MEM_COMMIT
    IN ULONG Protect );          = 64 (0x40) = PAGE_EXECUTE_READWRITE

Next hit is for tace (NtWriteVirtualMemory) which will allocate 5883 bytes of shellcode into this newly allocated region.

tace ->  NtWriteVirtualMemory(
   IN HANDLE    ProcessHandle,                   = -1 (self)
   IN PVOID     BaseAddress,                     = 100139008 (0x5F80000)
   IN PVOID     Buffer,                          = 172940788 (0xA4EDDF4)
   IN ULONG     NumberOfBytesToWrite,            = 5883 (16FB)
   OUT PULONG   NumberOfBytesWritten OPTIONAL ); = 0

The next hit is at the “condole” (CreateTimeQueueTimer) in the “eyesonly” function, which once completed will transfer control to the shellcode’s entry point at offset 0x1090 from its base (0x5F81090 - 0x5F80000).

condole -> CreateTimerQueueTimer(
   _Out_      PHANDLE             phNewTimer,
   _In_opt_   HANDLE              TimerQueue,
   _In_       WAITORTIMERCALLBACK Callback,    = 100143248 (0x5F81090)
   _In_opt_   PVOID               Parameter,
   _In_       DWORD               DueTime,
   _In_       DWORD               Period,
   _In_       ULONG               Flags);

We attach a debugger (x64dbg) to the Winword’s process, go to the 0x5F81090 address and set a breakpoint, so that when the CreateTimeQueueTimer function transfers control to the shellcode’s entry point, we can proceed with our analysis from there. We go to the base of the shellcode 0x5F80000 and save 5883 (16FB) bytes from this address to a file called shellcode.bin for further analysis.

Once we’re done with the static code analysis of the shellcode, we can continue with its dynamic analysis in the debugger. The only thing left is to allow the VB debugger of WINWORD to continue execution so we can hit the breakpoint at the shellcode’s entry point.

Loading the shellcode.bin file in IDA Pro for static analysis doesn’t seem to recognize our entry point at offset 0x1090 as a function, so we’ll have to create one manually and name it “Main”.

We turn our attention towards the first function call in Main - sub_F01, which seems to accept one interesting and hardcoded parameter. In fact, there are 19 locations within the shellcode where this function is called in with similar values so I’ll rename it to “resoveAPIhash” for now.

In shellcode and other obfuscated code, in order to hide functionality, malicious code authors hide function names by pre-calculating hash values of their names and compare them with the generated during runtime hash values in order to obtain their addresses and achieve stealth. Since they don’t always go for re-inventing the wheel, they utilize known and available algorithms. The FLARE guys at Mandiant/FireEye used this to their advantage and created the “shellcode_hash_search.py” plugin for IDA Pro to search for such known hashes within the shellcode’s body and mark them accordingly (https://github.com/fireeye/flare-ida/tree/master/python/flare). For the majority of shellcode I have seen in the wild it does a pretty good job. However in this particular example it didn’t find anything. Since the plugin is dependent on an SQLite database file containing pre-calculated hash values, the FLARE guys offer the ability to generate one yourself using the “make_sc_hash_db.py” file. In this case we will try and find the hashing routine, understand the logic behind it, re-implement in FireEye’s script and generate a new database file to use with the plugin.

The hashing routine is located in function sub_E07 (offset 0x0E07) and incorporates some byte shifting and XOR-ing routines that we re-implement in the make_sc_hash_db.py file by adding the following python code:

def customHancitor(inString,fName):
    val = 0
    for i in inString:
        val = ord(i) ^ ((val >> 0x18) | (val << 0x7))
        val &= 0xFFFFFFFF
    return val

We also need to add it in the HASH_TYPES list of tuples and re-create the database file.

Finally the script is able to resolve all the hashes and mark them appropriately in the IDA database file:

Now back to looking into the main functionality of the shellcode. After initially resolving some of the addresses of Windows functions, the shellcode will search for the ^YOUHO magic bytes within loaded by Word’s document file and read 107992 bytes from the end of the magic bytes into a newly allocated buffer.

This newly allocated buffer goes through a multi-byte XOR routine and is later Base64 decoded.

In order to extract the executable file we set a breakpoint in the debugger’s where the Base64Decode function is called (offset 0x05F812E8) and we can either get the Base64 string and decode it ourselves or allow the shellcode to do this for us in order to obtain the encoded executable file. We’ll name that executable file “injected.exe”

Continuing the analysis of the shellcode, we can see that it will spawn an instance of svchost.exe (x86 or x64 depending on MS Word’s version) in a suspended state and perform a process hollowing injection of the newly extracted executable into it.

Since there’s nothing more to look at the shellcode, we turn our attention at the extracted executable to understand its inner-workings. After resolving the addresses of Windows API functions, the “injected.exe” malware will generate a unique ID for the system it’s running on, get the username, computer name, external IP address (by sending a request to api.ipify.org) and jump to decoding its RC4 encrypted configuration.

The key used to decode the embedded configuration is “8D 60 D3 01 CB 12 4F 4D” which decodes the following configuration data blob:

It does seem to contain the initial C2 servers (divided by the pipe symbol) as well as the version of the Hancitor malware – “25phe01”. We have seen the malware contacting only the first C2 server in the PCAP file, but should it have failed, the malware would try contacting the rest of the C2 servers in the list.

After the C2 URL is obtained, an HTTP POST request is built to be sent by the malware towards it. The malware is using a hard-coded user agent and HTTP header fields, but none of them seems to be unique enough to make an IOC out of it.

After the POST request is sent, the malware will read the response from the server and check if it is Base64 encoded. If not or no data is returned, the malware will re-attempt to connect with any of the other C2 servers.

As it turns out, the first four bytes of that Base64 string are ignored and whatever is left goes through the de-obfuscation routine –> Base64 decode + XOR with the “0x7A” key.

Now we have the knowledge of how to decode this initial traffic and we turn our attention back to the traffic associated with that initial beacon out. After decoding the blob, it seems to contain a big list of what looks like additional C2 servers.

We immediately recognize the mail.voicesinprintpublishing.com and neubacher.at entries as we have seen them in the PCAP file.

We now need to understand how the malware interprets this data and what actions are taken as a response to it. The switch-table function (offset 0x402170) is where the commands from the servers are interpreted and acted on. We have only seen the options “l”, “b” and “r” in the decoded C2 server list so we will focus on them only.

The option “l” leads to downloading an executable from the C2 server and run it in the memory space of malware’s own process, whereas option “b” will inject the code in svchost.exe process explicitly. The routine taken when the option “r” is passed on ultimately leads to the download of an executable from the C2 server into a file on disk and its consequent execution, but let’s take a closer look at it.

It mainly downloads and decodes an executable from the C2 channel, writes it to a temp file and runs it.

Since we’re interested in how the data is downloaded and decoded from the C2 server as well as how it’s saved and executed we’ll take a look at both functions.

Looking at the “Download_DecodeExecutableFromC2” routine (address 0x401940) we see that the malware would create an HTTP GET request to the first C2 server in the list and contact the rest if no appropriate response is received. The returned back data is checked for the presence of few magic bytes {80 A8 15 54}, which are more than likely the encoded output of the first bytes of an executable file, and if the requirement is satisfied the buffer is passed on for de-obfuscation.

These magic bytes are exactly what we have seen in the PCAP as a response from “mail.voicesinprintpublishing.com”

The decoded executable would go through the “WriteTmpFileAndExecute” (offset 0x402EA0) function, where it will be saved on disk in the %TEMP% folder with a randomly generated filename, which is always prefixed by the “BN” characters.

Looking at the de-obfuscation routine (Offset 0x4015B0), we can see that it’s doing some byte mangling before decompressing it using the LZNT1 format.

Despite being rather simple routine to code in python, we’ll use some binary instrumentation to recreate the decoding functionality of the malware. In order to do that we’ll use the unicorn python library which emulates CPU instructions. Since python does not natively support LZNT1 decompression a third party library was used. Unfortunately this library produced errors when used in a script, but worked just fine when invoked from the python interpreter directly so only the byte shifting functionality has been ported to python. Below is the code used to re-implement the byte mangling functionality:

from __future__ import print_function
from unicorn import *
from unicorn.x86_const import *
    
f = open("/path/to/extracted/binary1.raw", "rb")
obfuscated = f.read()
f.close()

# code to be emulated (taken from loc_4015D4)
SC = b"\x8B\xC1\x83\xE0\x07\x8A\x04\x30\x30\x04\x31\x41\x3B\xCA\x72\xF0"

# Build final code to emulate
X86_CODE32 = SC + obfuscated)


# memory address where emulation starts
ADDRESS = 0x1000000
print("Emulate i386 code")
try:
    ADDRESS = 0x1000000
    mu = Uc(UC_ARCH_X86, UC_MODE_32)
    mu.mem_map(ADDRESS, 5 * 1024 * 1024) #Allocate 5MB.
    mu.mem_write(ADDRESS, X86_CODE32)
    mu.reg_write(UC_X86_REG_ECX, 0x8)
    mu.reg_write(UC_X86_REG_EDX, len(obfuscated))
    mu.reg_write(UC_X86_REG_ESI, 0x1000010)

    # Run the code and skip errors.
    try:
        mu.emu_start(ADDRESS, ADDRESS + len(X86_CODE32))
    except UcError as e:
         pass

    print("Emulation done.")
    compressed = mu.mem_read(0x1000010 + 0x8, len(obfuscated) - 0x8)

except UcError as e:
    print("ERROR: %s" % e)

fw = open("/path/to/extracted/compressed1.exe", "wb")
fw.write(compressed)
fw.close()

The compressed1.exe is further decompressed manually into its final binary file. This would have worked just fine, but rather later I noticed that the PCAP file is missing packets, therefore no proper extraction could be achieved to verify our analysis results 😞.

Instead, I looked up for another Hancitor PCAP (http://www.malware-traffic-analysis.net/2018/05/15/index3.html) that incorporates all the packets needed to fully extract the executable binaries.

This time extracting the three binaries from the PCAP is successful and we can successfully decode them. We can also see that there are three executables downloaded in total, which are all part of the infection, but only one of them is accounted for as the final payload on the web-site.

  • Decoded-1.exe (MD5: 1FB9E41282CA642E52590BF667C7E7DE)
  • Decoded-2.exe (MD5: 2B7BE498B4E93D993D654BBE2E70742F)
  • Decoded-3.exe (MD5: 836B83895D918F61023CA30361771A5F) – Matches the hash of “2018-05-15-Zeus-Panda-Bancker-caused-by-Hancitor-infection.exe” provided on the http://www.malware-traffic-analysis.net/2018/05/15/index3.html web-site.

As we have seen earlier in the PCAP analysis, the C2 server will also return Base64 encoded commands that will look like “QVEJARRABw==”, “GFUTARRABw==”, etc. Since we now know the first four bytes are ignored, it all boils down to the “ARRABw==” string in all of those communications. This could easily be used as an IoC for network traffic should Hancitor keeps using the same encoding routines. When Base64 decoded and 0x7a XOR-ed, the string is de-obfuscated to “{n:}”, which tells the malware to wait and seek instructions from the C2 server later.

As a summary, we were able to understand how the document’s VB Macro code injects a shellcode into WINWORD’s address space, extract it and enrich our analysis with the “shellcode_hash_search.py” plugin. Further we were able to understand how the shellcode extracts and injects an executable file into svchost.exe instance. We were also able to understand the functionality of the injected executable file, decode its configuration, understand its network encoding routines as well as re-implement them using binary instrumentation to decode existing traffic and verify our analysis results.

Happy reversing ;)

Written on August 19, 2018