I went to Black Hat over Wednesday and Thursday. The presentation most people wanted to see (including me) was Joanna Rutkowska breaking the Vista x64 driver signing that I hate so much. I wanted to see what trick she’d found. I was let down, however, when she presented her technique.
Her trick was to allocate a bunch of memory so that the kernel pages itself and drivers out, uses raw disk access to overwrite the pagefile, then does an uncommon operation that causes her desired code to execute in kernel mode. Sound familiar? This is the same thing I proposed as a reason why x64 driver signing is pointless when I whined about the real reason for driver signing. I’d already thought of it, so it was nothing new to me, like most of the rest of the conference.
Obviously, Joanna thought of it long before I did, so there is nobody to blame for it. Except, I guess, Microsoft.
As we all know the x86-ISA has a lot of redundant instructions (ie. instructions with the same semantic but different opcodes). Sometimes this is unavoidable, sometimes it looks like bad design. But with SSE it gets really weird. Let’s say we want to perform xmm0 <- xmm0 & xmm1 (ie. bitwise and). Not an uncommon operation; but we have 3 different ways do archive this:
- andps xmm0, xmm1 (0f 54 c1)
- andpd xmm0, xmm1 (66 0f 54 c1)
- pand xmm0, xmm1 (66 0f db c1)
(Note that andpd/pand are SSE2 instructions)
Regarding the result in xmm0 these are really the same instructions. Now, why did Intel do this? First we’re going to inspect andps/andpd. Looking at the optimization manuals we get a hint: The ps/pd mark the target register to contain singles or doubles, so they should match the actual data you are operating on.
It looks like the processor internally handles the floats in some “unpacked” structure and the ps/pd is a sort of hint whether it has to repack the number again. Or something like that, at least this is only an optimization issue. But that’s stupid, if the processor already knows the internal format, one “andp” instruction would be sufficient — the processor can peform andps or andpd anyway, depending on which would be faster in the situation. Or, looking at the MMX case, there we have no pandb, pandw, pandd, pandq etc. The same applies to “movapd/movdqa memory, xmm”: Damn, it’s the processor who knows better than me how to achive this the fastest way.
Finally, let’s look at pand. After Intel recognized that MMX is a complete mess, they opened the MMX instructions for the xmm registers (0×66 prefix). And now? We have a third way to do the AND… And it somehow looks like they never had SSE2 in mind, when they designed the SSE1 instructions.
In Windows Vista x64, drivers are required to be signed by someone holding a VeriSign code certificate or they won’t load. There is no way to (permanently) disable this signing even if you are Administrator. The F8 startup menu has an option to disable it, but you must select it every time you boot up. Microsoft’s claimed reason for this is that it prevents Trojans from installing kernel-mode rootkits. That is a load of crap. Continue reading
I just found out the hard way that in 32 bit programs under Win64, the value of CS changed. In Win32, the value of CS is 0x001B. In 32 bit programs under Win64, it’s 0×0023. This will probably break some programs, especially debuggers.
Why did Microsoft do this? It’s not like the value of CS is undocumented: it’s in the DDK as KGDT_R3_CODE, and I’ve seen it several times in other places on MSDN. I can’t see any reason that they changed it. The 64 bit CS didn’t replace it – the 64 bit CS is 0×0033.
Normally I wouldn’t post 2 things in 2 days but this just really annoys me.