Assembly Evolution Part 1: Accessing Memory and the strange case of the Intel 4004

by Julien Oster, reprinted with permission.

While it has become far less relevant for non-system developers to write assembly than it was a few decades ago, by now CPUs have nevertheless made it much more comfortable to do so. Today we are used to a lot of things: fancy indirect addressing modes with scale, a galore of general purpose registers instead of an accumulator and maybe one or two crippled index registers, condition codes for nearly every instruction (on ARM)…

But also the basics themselves have evolved. Let’s take a look at what past programmers had to put up with in entirely simple, everyday things. We’ll start with the most trivial: writing to memory.

Our goal is to write a single immediate value of 3 into the memory location 5. In light of paging, segmenting and bank switching, we’ll use whatever is convenient as a definition for “memory location”. Also, we’ll let the CPU decide what the word size should be. Since you only need 2 bits to represent a 3, it should fit with every CPUs word size (except for 1 bit CPUs, which actually existed, but that’s a story for another posting). If we have the choice, we’ll just take the smallest.

We’ll work backwards, from the present to the past, to explore the wonders of direct addressing in Intel CPUs. (One precautionary warning though: I only really tested the 4004 code in an emulator, and my habits are highly tainted by current Intel CPUs. So if I made some mistake somewhere, kindly point it out and I’ll fix it!)


On a modern x86 CPU, it is of course fairly easy to write the value 3 to memory cell 5. You just do it:

mov byte [5], 3

A single instruction, simple and obvious. I cheated a bit by not using a segment prefix, nor did I set up any segment registers/selectors beforehand. But assuming a nowadays common OS environment in protected mode, you probably don’t want to fiddle with those selectors anyway.


The Intel 8085 is somewhat of a direct predecessor to the 8086, the first in the line of the excessively successful x86 processors. While the 8086 has a 16 bit data bus, the 8085 only has 8 bit. The address bus is already full 16 bit, but its 16 bit capabilities are limited. Specifically, there is no immediate 16 bit addressing (except for branches), leaving us no way to specify our memory location in the instruction that actually performs the move.

Memory is instead addressed with a pseudo register called M. This pseudo register is in reality just backed by the registers H and L paired together, each 8 bit wide, and accessing it accesses the memory location they point at (you may take a guess which register receives the High byte, and which the Low byte of the address).

Luckily, there are a few simple 16bit instructions for moving immediate values, so all in all we can write our byte with:

LXI H, 0005h ; unlucky syntax, as this actually means HL instead of just H

MOV M, 3

By the way, bonus points if you are somehow able to find out just when the address in HL is available on the address bus. The same applies to the 8080 and 8008. Does the CPU copy the register pair’s content to the address bus pins only when actual memory operations take place, or are the address bus pins somehow directly connected to H and L itself? Is that even feasible? I’d really like to find out…


We continue going further back, skipping the 8080 because it was identical in that regard, and arrive at its direct predecessor instead, the Intel 8008. The 8080 and 8085 were source compatible to the 8008 (which, mind you, is not the same as binary compatible… also it may or may not have required some light automated translation), but in the downward direction we have something vital taking from us: While already using 16bit addresses (with only a 14bit address bus, resulting in 16k memory, though), the only instructions that were allowed to contain 16bit immediate values at all are jumps and branches. Consequently, we are left with no way to completely specify our destination address in one instruction!

Instead, we have to access H and L, together forming pseudo register M’s address, one at a time:

LHI 00h

LLI 05h



It’s hardly possible to go back further than the Intel 4004, at least if you are only considering single chip CPUs (at the time of its conception in the early 70s, there were already famous multi-chip CPUs with comfortable orthogonal instruction sets, notably the PDPs). Indeed, it was the first widely available single chip CPU. This little thing was a 4-bit CPU with some strange quirks, which we will explore further. Overall, it bears little to no resemblance to its successor in name, the Intel 8008 (except for the internal stack, which both had–I will cover that in another posting).

But let’s just look at the code for writing a value of 3 into the memory location at 5 first:

FIM P0, 5; load address 05h into pair R0,R1

SRC P0   ; set address bus to contents of R0,R1

LDM 3    ; load 3 into accumulator

WRM      ; write accumulator content to memory

That looks a bit strange.

As a 4 bit CPU, the 4004 has 4 bit wide registers and addresses 4 bit nibbles as words in memory. It has only one accumulator on which the majority of operations is performed, but sixteen index registers (R0-R15).

Those index registers are handy for accessing memory: Besides loading values directly from ROM, an instruction exists to load data indirectly, which sets the address bus to the ROM cell’s content. Another instruction performs an indirect jump instead. Other than that, you can just increment index registers, albeit there is the interesting “ISZ” instruction that not only increments, but also branches if the result is not 0.

Because the 4004 uses 8 bits to address the 4 bit nibbles, every two consecutive index registers form a pair, which is then used for memory references.

Note that I explicitly said ROM above. This is because in the 4004 architecture, ROM and RAM are actually vastly different beasts, at least from the assembly programmer’s perspective. You can not directly access RAM. It always involves index register pairs, manually sending their content to the address bus (with a strangely named instruction “SRC”, which for some reason spells out send register control) and then issuing another instruction which transfers from or to the accumulator.

Interestingly, accessing regular RAM nibbles is not your only choice among the transfer instructions. You can also fetch from and to I/O ports. But the CPU does not have any direct I/O port, instead they are available on both RAM and ROM! You can also read and write “RAM status characters”, which to me look like plain regular RAM cells within another namespace. If someone knows, I’d love to hear what they were used for (and if they maybe did behave differently to normal RAM).

Take a look at the data sheet. Within its only 9 pages, the instruction set is depicted on page 4 and 5. Especially in the light that fairly reasonable orthogonal instruction sets appear to have been available in multi-chip CPUs, this first single-chip CPU is clearly a strange specialization towards the desk calculator it was meant for (the Busicom 141-PF). It has the aforementioned index register-centered RAM access, separate ROM (although there is a transfer instruction which strangely refers to some optional “read/write program memory”), a three level internal stack which is almost useless for general purpose programming and a lone special purpose instruction for “keyboard process” (KBP).

Original 4004 CPUs go from anything from a few to a few thousand dollars on eBay, depending on their packaging and revision. If you’d like to, you can instead play around with a virtual one in this java-script based, fully fledged assembler, disassembler and emulator, or read the rescued source code of the Busicom 141-PF calculator. There’s lots more of schematics, data sheets and other resources on the Intel’s anniversary project page.

That is, if you are brave enough.

15 thoughts on “Assembly Evolution Part 1: Accessing Memory and the strange case of the Intel 4004”

  1. “Does the CPU copy the register pair?s content to the address bus pins only when actual memory operations take place, or are the address bus pins somehow directly connected to H and L itself? Is that even feasible? I?d really like to find out?”

    Well, I guess they are not – at the very least they are muxed so that the stack pointer and instruction pointer can be used to address memory as well.

  2. You’re right, it needs to be multiplexed with the instruction pointer at least. But on the 8008, which was the one that prompted me to think about this, there is no external stack and I wonder what happens outside the instruction fetch cycles.

    The 8080 and 8085 actually have some more addressing modes, so the M pseudo-register seems a bit more like a compatibility thing, and I can imagine that there are some more layers inbetween.

    Unfortunately, I have never designed a CPU (except emulated in software), so it’s just guesswork…

  3. Hi folks,

    The article points to the 4004 datasheet where the bus protocol is provided. Although instruction execution sends out a complete 12-bit address, for data access, essentially each RAM and ROM chip has an internal address register as a way of ‘optimizing’ bandwidth on the shared 4-bit address/data bus.

    I wrote a small blog about it recently when I was trying to figure it out (can’t vouch for its complete accuracy).

    -cheers from Julz.

  4. Thanks for the article Julien,

    Just on the 8085 CPU, the MOV M,3 won’t work. It needs to be MVI M,3 to move immediate.

    Also, when I was wondering how the 4004 CPU worked I wrote a simulator on AmigaOS for my own interests which I later released to AmiNET. It requires AmigaOS 3.5+ so that severely limits the audience, but then as I said, I did write it as an experiment on the OS I was using at the time. If anyone wants to look it is at The javascript one listed above is probably more useful, though.

  5. hi

    i m Haider Ali 19 years old from Iraq

    University student in the first phase, in the engineering of computer technology

    is there difference between (engineering of computer technology)

    i have question if you please

    1- i wont to become reverse engineer what programming language should i learn

    from where could i start (with any CPU) i don’t now any thing
    about CPU architecture and CPU reverse engineering could you tell me please
    i Intel 8085/8085A is enough for beginner

    can you give me a lot of tips about that please

  6. >>>By the way, bonus points if you are somehow able to find out just when the address in HL is available on the address bus. The same applies to the 8080 and 8008. Does the CPU copy the register pair’s content to the address bus pins only when actual memory operations take place, or are the address bus pins somehow directly connected to H and L itself? Is that even feasible? I’d really like to find out…<<<

    Ok the easy one the HL does not connect to pins, least not directly. The address pins can have at any time
    (PC, SP, HL, DE, SP) register contents or the result of an immediate operation like JMP.

    The register appear at various times in the instruction cycle. The first part of a cycle is instruction fetch (usually 4 T states) and decode then the action following (additional T states in additional cycles) is the action for example STAX D would be get the instruction, followed by outputting Acc to the data bus with the contents of the DE pair on the address bus.

    FYI the "M" psudo register is meaningless as M really stand for Memory at address specified bycontent of HL pair. Its an oddity of the intel mnemonics rather than actual hardware. Z80 codes express it far more cleanly.

    Also don't forget the dozen or so unofficial instructions in the 8085 (all!) and also the Z80 has a
    complete set of additional instruction Zilog never documented but all Z80s do it.

    The intel manuals (user manual for 8080/8085) details all of this and more to a greater degree.

    And yes I also designed with 8008. Not a fun part to program.

    As to odd CPUs, TI9900 (check the registers in ram and the bitwise IO), IM6100 a pdp8 work alike
    where opcode 6 (IOT) can do input, output, IO load PC or add to PC to name a few possible things.
    What that opcode group does is defined in user hardware.


Leave a Comment