x86 – pagetable.com

Using the OS X 10.10 Hypervisor Framework: A Simple DOS Emulator

2015-01-04 by Michael Steil

Since Version 10.10 (Yosemite), OS X contains Hypervisor.framework, which provides a thin user mode abstraction of the Intel VT features. It enables apps to use virtualization without the need of a kernel extension (KEXT) – which makes them compatible with the OS X App Store guidelines. read more

Assembly Evolution Part 1: Accessing Memory and the strange case of the Intel 4004

2011-10-13 by Michael Steil

by Julien Oster, reprinted with permission. read more

Gate A20

2011-06-14 by Michael Steil

The Intel 80376 – a Legacy-Free i386 (with a Twist!)

2010-11-16 by Michael Steil

25 years after the introduction of the 32 bit Intel i386 CPU, all Intel compatibles still start up (and wake up!) in 16 bit stone-age mode, and they have to be switched into 32/64 bit mode to be usable. read more

CPUID on all CPUs (HOWNOTTO)

2010-09-14 by Michael Steil

A while ago, an engineer from a respectable company for low-level solutions (no names without necessity!) claimed that a certain company’s new 4-way SMP system had broken CPUs or at least broken firmware that didn’t set up some CPU features correctly: While on the older 2-way system, all CPUs returned the same features (using CPUID), on the 4-way system, two of the CPUs would return bogus data. read more

Why is there no CR1 – and why are control registers such a mess anyway?

2010-07-02 by Michael Steil

If you want to enable protected mode or paging on the i386/x86_64 architecture, you use CR0, which is short for control register 0. Makes sense. These are important system settings. But if you want to switch the pagetable format, you have to change a bit in CR4 (CR1 does not exist and CR2 and CR3 don’t hold control bits), if you want to switch to 64 bit mode, you have to change a bit in an MSR, oh, and if you want to turn on single stepping, that’s actually in your FLAGS. Also, have I mentioned that CR5 through CR15 don’t exist – except for CR8, of course? read more

Intel VT VMCS Layout

2010-05-24 by Michael Steil

I understand that there might be a good reason for Intel to add virtualization extensions to their CPU architecture. Instead of fixing the x86 architecture to (optionally) make it Popek-Goldberg compliant and have all critial instructions trap if not run in Ring 0, they added non-root mode, a very big hammer that allows me to switch my CPU state completely to that of the guest and switches back to my original host state on a certain event in the guest. Well, it’s a great toy for people who want to play with CPU internals. read more

PCEPTPDPTE

2009-10-31 by Michael Steil

Here is a new pagetable entry. read more

A Standalone printf() for Early Bootup

2009-09-07 by Michael Steil

A while ago, I complained about operating systems with overly complicated startup code that spends too much time in assembly and does hot have printf() or framebuffer access until very late. read more

Minimizing the Assembly needed for Machine Initialization

2009-08-10 by Michael Steil

In many operating systems, I have seen overly complicated startup code. Too much is done in assembly, and printf() and framebuffer access is only available very late. In the next three blog posts, I will show how this can be avoided. read more

Aggressive Tail Call Optimization

2009-08-04 by Michael Steil

In some i386/x86_64 assembly code my coworker was working on, there was a macro like this: read more

The Infinite Loop Mystery

2009-07-20 by Michael Steil

Today’s puzzle is about some code behaving horribly wrong. read more

Reverse-Engineering DOS 1.0 – Part 2: IBMBIO.COM

2009-05-12 by Michael Steil

Update: The source is available at github.com/mist64/msdos1 read more

Reverse-Engineering DOS 1.0 – Part 1: The Boot Sector

2009-05-07 by Michael Steil

Update: The source is available at github.com/mist64/msdos1 read more

The Easiest Way to Reset an i386/x86_64 System

2009-04-23 by Michael Steil

Try this in kernel mode: read more

How retiring segmentation in AMD64 long mode broke VMware

2006-11-09 by Michael Steil

UNIX, Windows NT, and all the operating systems in their class rely on virtual memory, or paging, in order to provide every process on the system a complete address space of its own. An easier way to protect processes from each other is segmentation: The 4 GB address space of a 32 bit CPU is divided into segments (consisting of a physical base address and a limit), one for each process, and every process may only access their own segment. This is what the 286 did. read more

Strange SSE3 opcodes

2006-09-03 by seppel

Intel used some strange opcodes for the SSE3 instructions. All MMX/SSE opcodes use the 0x0f prefix (former “pop cs”). They soon noticed the the 0x0f area gets full, so they used the 0x66, 0xf2, 0xf3 prefix as modifiers. The basic rule is: read more

How to divide fast by immediates

2006-08-13 by seppel

In almost all assembly books you’ll find some nice tricks to do fast multiplications. E.g. instead of “imul eax, ebx, 3” you can do “lea eax, [ebx+ebx*2]”Â (ignoring flag effects). It’s pretty clear how this works. But how can we speed up, say, a division by 3? This is quite important since division is still a really slow operation. If you never thought or heart about this problem before, get pen and paper and try a little bit. It’s an interesting problem.

Shift oddities

2006-08-07 by seppel

Most of the x86 instructions will automatically alter the flags depending on the result. Sometimes this is rather frustrating because you actually what to preserve the flags as long as possible, and sometimes you miss a “mov eax, ecx” which alters the flags. But at least it’s guaranteed that an instruction either sets the flags or it doesn’t touch them, independent of the actual operation… Or is it? read more

Redundant SSE instructions

2006-07-31 by seppel

As we all know the x86-ISA has a lot of redundant instructions (ie. instructions with the same semantic but different opcodes). Sometimes this is unavoidable, sometimes it looks like bad design. But with SSE it gets really weird. Let’s say we want to perform xmm0 <- xmm0 & xmm1 (ie. bitwise and). Not an uncommon operation; but we have 3 different ways do archive this:

andps xmm0, xmm1 (0f 54 c1)
andpd xmm0, xmm1 (66 0f 54 c1)
pand xmm0, xmm1 (66 0f db c1)

(Note that andpd/pand are SSE2 instructions)
Regarding the result in xmm0 these are really the same instructions. Now, why did Intel do this? First we’re going to inspect andps/andpd. Looking at the optimization manuals we get a hint: The ps/pd mark the target register to contain singles or doubles, so they should match the actual data you are operating on. read more