Minimizing the Assembly needed for Machine Initialization

In many operating systems, I have seen overly complicated startup code. Too much is done in assembly, and printf() and framebuffer access is only available very late. In the next three blog posts, I will show how this can be avoided.

In this post, I am showing how little assembly code is needed for startup. Minimizing the assembly makes your code significantly more maintainable. Everything that really needs to be done is setting up the CPU state to support 64 bit (or 32 bit) C code running at physical addresses, and everything else, including setting up the final machine state, can be done in C with a little inline assembly. The following example switches from 16 bit (real mode) or 32 bit mode (flat protected mode) into 64 bit mode on an x86_64 CPU (NASM syntax):

PAGE_SIZE	equ	0x1000
STACK_SIZE	equ	16*1024

PML2		equ	0x1000
PML3		equ	PML2+PAGE_SIZE
PML4		equ	PML3+PAGE_SIZE

STACK_BOTTOM	equ	PML4+PAGE_SIZE
STACK_TOP	equ	STACK_BOTTOM+STACK_SIZE

CODE_START	equ	STACK_TOP

	org CODE_START

	[BITS 16]
	cli

	; clear 3 pages of pagetables
	mov edi, PML2
	xor eax, eax
	mov ecx, 3*4096/4
	rep stosd

	; set up pagetables
	mov dword [PML2], 0x87		; single 4 MB id mapping PML2
	mov dword [PML3], PML2 | 7	; single entry at PML3
	mov dword [PML4], PML3 | 7	; single entry at PML4

	; load the GDT
	lgdt [gdt_desc]

	; set PSE,  PAE
	mov eax, 0x30
	mov cr4, eax

	; long mode
	mov ecx, 0xc0000080
	rdmsr
	or eax, 0x100
	wrmsr

	; enable pagetables
	mov eax, PML4
	mov cr3, eax

	; turn on long mode and paging
	mov eax, 0x80010001
	mov cr0, eax

	jmp SEL_CS:code64

code64:
	[BITS 64]
	mov ax, SEL_DS
	mov ds, ax
	mov es, ax
	mov fs, ax
	mov gs, ax
	mov ss, ax

	mov sp, STACK_TOP
	call start

inf:	jmp inf

gdt_desc:
	dw  GDT_LEN-1
	dd  gdt
	align 8
gdt	db 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ; 0x00 dummy
gdt_cs	db 0xff, 0xff, 0x00, 0x00, 0x00, 0x9b, 0xaf, 0x00 ; 0x08 code64
gdt_ds	db 0xff, 0xff, 0x00, 0x00, 0x00, 0x93, 0xaf, 0x00 ; 0x18 data64

GDT_LEN equ $ - gdt
SEL_CS equ gdt_cs - gdt
SEL_DS equ gdt_ds - gdt

While switching into 32 bit flat mode is trivial, switching into 64 bit mode requires setting up pagetables. This code sets up 4 MB of identity-mapped memory starting at address 0.

The code is designed to switch from 16 bit mode into 64 bit mode, but since 16 and 32 bit flat mode on i386 are assembly source compatible, you can replace “[BITS 16]” with “[BITS 32]”, and the code will switch from 32 bit to 64 bit mode. (Yes, it is possible to switch from real mode directly into 64 bit mode: osdev.org Forums)

If you use this code, be careful about the memory layout. The example leaves the first page in memory untouched (in case you need the original real mode BIOS IDT for something later), and occupies the next few pages for the pagetables and the stack. Your code should be above that, but, on a BIOS system, not be between 640 KB and 1 MB (this might be device memory and ROM), and also not above 1 MB before you have enabled the A20 gate.

While this code is enough to support 64 bit C code, this is not enough to set up the machine to support all aspects of an operating system. You probably want to set up your own GDT that has entries for 32 bit code and data too, you want to set up an IDT in order to be able to catch exceptions and interrupts, and you will need real pagetables to support virtual memory. Also, you will have to move your stack pointer once you have your final memory layout.

But it is possible to construct the overly complicated GDT, IDT and pagetable structures using readable C code, and the “lidt”, “lgdt” etc. instructions can be done in inline assembly. While this is not portable C code, it is possible to reuse large parts of the machine initialization for a 32 bit (i386) and a 64 bit (x86_64) version of the operating system, which is not as easy to get right in assembly.

In my next post, I am going to show how easy it is to get printf() working as soon as you reach C, so you don’t have to mess around with puts()- and print_hex()-like functions in early machine initialization.

7 thoughts on “Minimizing the Assembly needed for Machine Initialization”

  1. The link at line 55 is missing the quotation mark and the final angular bracket. Most of the post, that is everything between “too much is done in assembly, and” and “If you use this code”, is not visible using Chrome, Safari and Internet Explorer 8 (and most likely many other browsers).

  2. Shouldn’t you set rsp or esp once long mode is enabled? True, the high word of esp/high dword of rsp is probably initialized to zero anyway.

Leave a Reply to Christian Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.