A malware analysis challenge

From Hackfest 2020

Idea behing the challenge

I wanted to create a challenge focussed on malware analysis rather than "pure" reverse engineering challenge.
I came up with a mini track splitted in 4 challenges.
We will focuss on the last one today.

Challenge background

Challengers were giving a pcap file and this description:

I think I was attacked by a sophisticated malware! I'm pretty sure it was higly targetted! Please help me, I need to know what the malware did.

They had to analyze the pcap, understand each step and decrypt the communication with the CnC.
The CnC server is out of scope: don't attack it.

Malware infection process

width:100% height:100%
We will begin at the DLL step.

How to approach this kind of challenge?

Rather than finding a password validation, we will want to ask ourselves "What does the malware do?".
We will take a look at the pcap file; as a malware will always communicate with a remote Command and Control server.
We will try to make assumptions from static analysis before digging in the assembly code.

Don't reverse engineer something if you don't have to.
It is a tedious job and we might take shortcuts.

Before we begin...

This malware is safe to run; it won't break your computer.
However, there might be memory leak and at worse, it will cause a BSOD.

Better be safe than sorry, use a virtual machine!
https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/

Wireshark

  • Look at the traffic
  • Export the DLL file from the pcap file

Basic static analysis

$ file malware.dll
malware.dll: PE32 executable (console) Intel 80386, for MS Windows

Quick win

$ strings malware.dll | grep -i flag
$ 

What's a DLL?

DLL means Dynamic-Link Library. It exposes functions to be used consumed by other binaries. These are called exports.

#define DLL_EXPORT
 
extern "C" {
   DECLDIR int Add(int a, int b) {
      return(a + b);
   }
typedef int (*addFunc)(int, int); // We define the function signature; takes 2 int, returns 1 int
addFunc _AddFunc;
HINSTANCE haModule = LoadLibrary("add.dll"); // Locate our DLL
_AddFunc = (AddFunc)GetProcAddress(hInstLibrary, "Add"); // Get a pointer to the function `Add`
int res = _AddFunc(23, 43);

Ref: https://www.codeguru.com/cpp/cpp/cpp_mfc/tutorials/article.php/c9855/DLL-Tutorial-For-Beginners.htm

VS a Portable Executable

There's not much difference between PE files and DLL. A DLL will often expose a few functions meanwhile a PE file usually only exposes one.

Both have a main function. When a DLL is load, its main function is called.

You can call a DLL using rundll32.exe myDll,MyExport arg1 arg2, or regsvr32.exe myDll and more.

rundll32.exe add.dll,Add 1 2

Strings

Can we find interesting strings?

Imports

A binary will require functions to work, these must be imported.
In Linux, it would look like #include <string.h>.
It there's an import called CreateFile or ShellExecute, you can guess what the malware might do.

A malware will often try to hide what it can do; it will do dynamic imports resolution.

Dynamic analysis

Sometimes, dynamic analysis will be faster. xdbg is a free and open source for Windows. You can load and debug a DLL.

Useful shortcuts

F7: step into (goes into the function)
F8: step over (if you don't want to debug a function called)
bp Add: put a breakpoint to a specific function
bpc Add: remove that breakpoint

IDA

Useful shortcuts

<x>: to get the cross references of variables/functions
<n>: to rename a function/variable
<y>: change the type of a function/variable
<g>: goto some address
<esc>: go back
</>: add a comment in decompilation view
<;>: add a comment in disassembly view

Your turn! (~30 minutes)

Now that we've seen basic concepts, time to put it into practice

  • Retrieve the dll from Wireshark (https://niclov.in/montrehack/montrehack.pcap)
  • Find interesting strings
  • Look at the exports
  • Look at the imports: visualize what the binary can do
  • There are anti-disassembly and anti-debug: have fun

Hint: https://niclov.in/montrehack/hint_PzH1.md

Slides so far: https://niclov.in/montrehack/slides_PzH1.html

Ghidra might lag: disable the decompiler

Strings

GET / HTTP/1.1
134.209.233.27

Exports

getFlag
TlsCallback_0
DllEntryPoint

Imports

VirtualAlloc
VirtualProtect
SwitchToFiber
CreateFiber
ConvertThreadToFiber
socket
send
recv
WSAStartup

CreateFiber

LPVOID CreateFiber(
  SIZE_T                dwStackSize,
  LPFIBER_START_ROUTINE lpStartAddress,
  LPVOID                lpParameter
);

Allocates a fiber object, assigns it a stack, and sets up execution to begin at the specified start address, typically the fiber function.

A large chunk of data had been allocated previously and it is passed to this function.
We need to extract this data and analyze it.

Get our first flag

We will use xdbg.

Analyzing the function

Analyzing the function

  • Checks if the CnC responds with "ok"
    • Exits otherwise
  • Pushes 14ko of something on the stack
  • Does magic with it
  • Executes it

dat script

ea = 0x1000126C # start address
ea_end = 0x1001C7E5 # end address
shellcode = []
instr_len = 7 # mov [ebp+var_FB7], 84h = 7 bytes
for i in range(0, (ea_end - ea), instr_len):
    instr = idc.GetManyBytes(ea + i, ItemSize(ea + i)) # fetch the instruction
    shellcode.append(ord(instr[6])) # get the byte moved to the buffer
#print(shellcode)

time = [0x00, 0x01, 0x02, 0x03]
dec = []
for i in range(0, len(shellcode), 4):
    dec.append(shellcode[i])
    dec.append(shellcode[i + 1] ^ time[1])
    dec.append(shellcode[i + 2] ^ time[2])
    dec.append(shellcode[i + 3] ^ time[3])
writeToFile(dec)

Basic static analysis, again

$ strings blob.data
!This program cannot be run in DOS mode.
[...]

This is a whole PE file! How come?

Position independant executable (PIE)

Position-independent code is not tied to a specific address. This independence allows the code to execute efficiently at a different address in each process that uses the code.

So, the machine code will jump on relative addresses as non-pie code would jump on absolute addresses.

jmp 0x40621 vs jmp dword ptr[ebp+0x1337]

Your turn! (~20 minutes)

Extract the shellcode and xor it

With the shellcode, do what you did before:

  • Find interesting strings
  • Look at the imports: visualize what the binary can do

Hint: https://niclov.in/montrehack/hint_P3QX.md

Strings

134.209.233.27

Imports

LoadLibraryA
GetProcAddress

rip

Static analysis ftw

Your turn! (~20 minutes)

  • Decrypt the strings

  • Rename variables

  • Understand parameters sent to native Windows functions

To find constants, reactOS might be helpful

https://niclov.in/montrehack/decrypt_string.py

Decrypt the strings

import idaapi
import idautils

ea = 0x0403184
for i in range(0, 12):
    b = idaapi.get_byte(ea + i) # get the byte at ea + i
    b -= 4
    idaapi.patch_byte(ea + i, b) # replace it

What the binary can do now

  • Open/Create files
  • Communicate with the CnC

Educated guess: it reads files and sends the content to the CnC

The data is encrypted

  • Reverse the encryption
  • Write a decrypt function (if possible?)
  • Decrypt what was sent to the CnC
  • Profit?

Your turn! (~20 minutes)

Decrypt that pdf

Decryption

f = open('encrypted', 'rb').read()
f = f.decode('base64')
f = rc4(rc4_key, f)
f = xor(xor_key, f)
f = f.decode('base64')
writeToFile(f)

The end

The original challenge is on ringzer0team.com
Slides: https://niclov.in/montrehack/slides_4KJM.html

Questions?