Dec 13, 2014

INT in notepad.exe

The Import Name Table (INT) contains information how to fill the Import Address Table (IAT) during load time, with appropriate virtual addresses, i.e. virtual addresses to functions in other modules (but still within the same virtual address space). I will explain where to find the INT in the file on disk (and not in memory) and also what kind of information is within the INT. I'm using notepad.exe as example and Windows Vista 32 bit.

So where do we find the INT?

To find the INT, we must start with the headers in the Portable Executable (PE) file (notepad.exe). Within one of the headers, you will find a Directory Table, containing directories. In the case of notepad.exe, there is a directory called Import Directory, which we will inspect in more detail. Using PE Insider from Cerbero, we can easily see the Directory Table.

PE Insider - Directory Table

Above you see three directories. The Export Directory is the first one appearing in the PE file, then follows the Import Directory and the Resource Directory. There are more directories in the PE file, but explaining the directories is not within the scope of this post.

There are a lot of good information about the structures in the Import Directory on the net. I will discuss it briefly here.

The Import Directory holds a Relative Virtual Address (RVA), which point to an array of IMAGE_IMPORT_DESCRIPTOR's. Above, you can find the RVA on the Import Directory RVA row and the right most column, i.e. 0x00008BF4.

The IMAGE_IMPORT_DESCRIPTOR is defined as below, according to the header file winnt.h.
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;            
        DWORD   OriginalFirstThunk;         
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;                  
    DWORD   ForwarderChain;                 
    DWORD   Name;
    DWORD   FirstThunk;                     
} IMAGE_IMPORT_DESCRIPTOR;

The OriginalFirstThunk holds a RVA, which point to an array of IMAGE_THUNK_DATA's. So does the FirstThunk.

It can also be mentioned that the Name member in the IMAGE_IMPORT_DESCRIPTOR holds a RVA, which points to an Ascii string, which is the name of the DLL, e.g. kernel32.dll.

The IMAGE_THUNK_DATA is defined as below, according to the header file winnt.h.

typedef struct _IMAGE_THUNK_DATA32 {
    union {
        DWORD ForwarderString;       
        DWORD Function;             
        DWORD Ordinal;
        DWORD AddressOfData;        
    } u1;
} IMAGE_THUNK_DATA32;

The IMAGE_THUNK_DATA struct represents an imported function from a PE file. In the PE file, the IMAGE_THUNK_DATA is either an Ordinal value or an AddressOfData value. The latter value is a RVA, which points to an IMAGE_IMPORT_BY_NAME struct. How do the loader knows if the value in the IMAGE_THUNK_DATA is an Ordinal value or an AddressOfData value? The loader checks the most significant bit in the value. If the bit is 0, the value is an AddressOfData value, if the bit is 1, the value is an Ordinal value.

The IMAGE_IMPORT_BY_NAME is defined as below, according to the header file winnt.h.
typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    BYTE    Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;

We are now ready to find the INT in file on disk, but first we must find the array of IMAGE_IMPORT_DESCRIPTOR's. We can't really use the RVA 0x00008BF4 directly, since the RVA only can be used when the PE file is loaded into memory. We must in some way translate the RVA to an Offset in the PE file (notepad.exe). I will present a concept here, that I call RVA-to-Offset translation.

First we check where the sections will be loaded in memory, more specific, let's check the section table in the PE file (notepad.exe). See below.

PE Insider - Section Table

We can see that the .text section will be loaded at RVA 0x00001000 with section size 0x00008F40 bytes, and the next section (.data section) will be loaded at RVA 0x0000A000. The array of IMAGE_IMPORT_DESCRIPTOR's starts at RVA 0x8BF4, i.e. within the .text section. This means that the array is 0x00008BF4 - 0x00001000 = 0x00007BF4 bytes from the start of the .text section. On disk, the .text section starts at Offset 0x400 according to the PointerToRawData column. So the array of IMAGE_IMPORT_DESCRIPTOR's on disk, can be found at Offset 0x400+0x00007BF4 = 0x7FF4.

Now when we have translated the RVA 0x00008BF4 to Offset 0x7FF4 in file on disk, we are going to check out what's going on at this location.

Below is a HEXVIEW of the IMAGE_IMPORT_DESCRIPTOR array, name strings (DLLs) and IMAGE_THUNK_DATA arrays. We are going to interpret the HEXVIEW byte by byte, so remember that x86 is using little-endian architecture!

PE Insider - HEXVIEW

The array of IMAGE_IMPORT_DESCRIPTOR's (within the black border) starts at Offset 0x7FF4 (as calculated above) and the first member in the first struct is the OriginalFirstThunk with the value 0x00008DB0 (first underscored value).

For an easier overview above, I have underscored each OriginalFirstThunk member, in each IMAGE_IMPORT_DESCRIPTOR. Counting elements in the IMAGE_IMPORT_DESCRIPTOR array, gives us 14 descriptors. Note that the last one is a NULL Descriptor that marks the end of the array. Each descriptor represents a DLL and its imported functions. This means that notepad.exe uses 13 DLLs. For instance, let's check the first descriptor and its name member. The name member is the fourth member in the IMAGE_IMPORT_DESCRIPTOR struct. At this location, we find the value 0x00008D0C, which is a RVA, which points to the name string. Using the RVA-to-Offset Translation, the RVA 0x00008D0C is translated to Offset 0x810C. This Offset is located directly after the IMAGE_IMPORT_DESCRIPTOR array above, and contains the Ascii string advapi32.dll (and 2 NULL terminations).

Well, back to the OriginalFirstThunk. As mentioned above the first OriginalFirstThunk has the value 0x00008DB0. This is a RVA, which points to an array of IMAGE_THUNK_DATA's. Using the RVA-to-Offset Translation, the RVA 0x00008DB0 will be translated to the Offset 0x81B0. At this location (see HEXVIEW above), we find an array (within the red border). Actually, this array is the Import Name Table (INT) that we was looking for in the first place. In this case, the INT (for the first DLL), contains 6 elements, where the last element is a NULL Data element. In other words, notepad.exe is importing 5 functions from advapi32.dll.

As you can see in the HEXVIEW above, the first element in the IMAGE_THUNK_DATA array contains the value 0x00009138. Remember that the loader must know how to deal with this value, so it checks the first bit, which in this case is a 0, meaning that it is the AddressOfData member. AddressOfData is a RVA, which points to a IMAGE_IMPORT_BY_NAME struct. Using the RVA-to-Offset translation, the RVA 0x00009138 corresponds to the Offset 0x8538 on disk. Let's check out what's going on here. Remember that the information at this location is an IMAGE_IMPORT_BY_NAME struct.

PE Insider - HEXVIEW

From the HEXVIEW at Offset 0x8538, we find the word 0x0268 (underscored), which is the hint member. Then follows the name (Ascii string) of the imported function, in this case ReqQueryValueExW (and two NULL terminations). Then follows another IMAGE_IMPORT_BY_NAME struct, with a hint and an Ascii string, and so on.

So finally we know where the INT is located, and how to read it. The loader is using the INT to fill the IAT. For instance, the loader realize that notepad.exe is importing 5 functions (by name, and not by ordinal) from advapi32.dll and will look up their virtual addresses, and write them to notepad.exe's IAT.

You are welcome to leave comments, complaints or questions!

No comments:

Post a Comment