Assembly Evolution Part 1: Accessing Memory and the strange case of the Intel 4004

by Julien Oster, reprinted with permission.

While it has become far less relevant for non-system developers to write assembly than it was a few decades ago, by now CPUs have nevertheless made it much more comfortable to do so. Today we are used to a lot of things: fancy indirect addressing modes with scale, a galore of general purpose registers instead of an accumulator and maybe one or two crippled index registers, condition codes for nearly every instruction (on ARM)…

But also the basics themselves have evolved. Let’s take a look at what past programmers had to put up with in entirely simple, everyday things. We’ll start with the most trivial: writing to memory.

Our goal is to write a single immediate value of 3 into the memory location 5. In light of paging, segmenting and bank switching, we’ll use whatever is convenient as a definition for “memory location”. Also, we’ll let the CPU decide what the word size should be. Since you only need 2 bits to represent a 3, it should fit with every CPUs word size (except for 1 bit CPUs, which actually existed, but that’s a story for another posting). If we have the choice, we’ll just take the smallest.

We’ll work backwards, from the present to the past, to explore the wonders of direct addressing in Intel CPUs. (One precautionary warning though: I only really tested the 4004 code in an emulator, and my habits are highly tainted by current Intel CPUs. So if I made some mistake somewhere, kindly point it out and I’ll fix it!)

x86

On a modern x86 CPU, it is of course fairly easy to write the value 3 to memory cell 5. You just do it:

mov byte [5], 3

A single instruction, simple and obvious. I cheated a bit by not using a segment prefix, nor did I set up any segment registers/selectors beforehand. But assuming a nowadays common OS environment in protected mode, you probably don’t want to fiddle with those selectors anyway.

8085

The Intel 8085 is somewhat of a direct predecessor to the 8086, the first in the line of the excessively successful x86 processors. While the 8086 has a 16 bit data bus, the 8085 only has 8 bit. The address bus is already full 16 bit, but its 16 bit capabilities are limited. Specifically, there is no immediate 16 bit addressing (except for branches), leaving us no way to specify our memory location in the instruction that actually performs the move.

Memory is instead addressed with a pseudo register called M. This pseudo register is in reality just backed by the registers H and L paired together, each 8 bit wide, and accessing it accesses the memory location they point at (you may take a guess which register receives the High byte, and which the Low byte of the address).

Luckily, there are a few simple 16bit instructions for moving immediate values, so all in all we can write our byte with:

LXI H, 0005h ; unlucky syntax, as this actually means HL instead of just H

MOV M, 3

By the way, bonus points if you are somehow able to find out just when the address in HL is available on the address bus. The same applies to the 8080 and 8008. Does the CPU copy the register pair’s content to the address bus pins only when actual memory operations take place, or are the address bus pins somehow directly connected to H and L itself? Is that even feasible? I’d really like to find out…

8008

We continue going further back, skipping the 8080 because it was identical in that regard, and arrive at its direct predecessor instead, the Intel 8008. The 8080 and 8085 were source compatible to the 8008 (which, mind you, is not the same as binary compatible… also it may or may not have required some light automated translation), but in the downward direction we have something vital taking from us: While already using 16bit addresses (with only a 14bit address bus, resulting in 16k memory, though), the only instructions that were allowed to contain 16bit immediate values at all are jumps and branches. Consequently, we are left with no way to completely specify our destination address in one instruction!

Instead, we have to access H and L, together forming pseudo register M’s address, one at a time:

LHI 00h

LLI 05h

LMI 3

4004

It’s hardly possible to go back further than the Intel 4004, at least if you are only considering single chip CPUs (at the time of its conception in the early 70s, there were already famous multi-chip CPUs with comfortable orthogonal instruction sets, notably the PDPs). Indeed, it was the first widely available single chip CPU. This little thing was a 4-bit CPU with some strange quirks, which we will explore further. Overall, it bears little to no resemblance to its successor in name, the Intel 8008 (except for the internal stack, which both had–I will cover that in another posting).

But let’s just look at the code for writing a value of 3 into the memory location at 5 first:

FIM P0, 5; load address 05h into pair R0,R1

SRC P0   ; set address bus to contents of R0,R1

LDM 3    ; load 3 into accumulator

WRM      ; write accumulator content to memory

That looks a bit strange.

As a 4 bit CPU, the 4004 has 4 bit wide registers and addresses 4 bit nibbles as words in memory. It has only one accumulator on which the majority of operations is performed, but sixteen index registers (R0-R15).

Those index registers are handy for accessing memory: Besides loading values directly from ROM, an instruction exists to load data indirectly, which sets the address bus to the ROM cell’s content. Another instruction performs an indirect jump instead. Other than that, you can just increment index registers, albeit there is the interesting “ISZ” instruction that not only increments, but also branches if the result is not 0.

Because the 4004 uses 8 bits to address the 4 bit nibbles, every two consecutive index registers form a pair, which is then used for memory references.

Note that I explicitly said ROM above. This is because in the 4004 architecture, ROM and RAM are actually vastly different beasts, at least from the assembly programmer’s perspective. You can not directly access RAM. It always involves index register pairs, manually sending their content to the address bus (with a strangely named instruction “SRC”, which for some reason spells out send register control) and then issuing another instruction which transfers from or to the accumulator.

Interestingly, accessing regular RAM nibbles is not your only choice among the transfer instructions. You can also fetch from and to I/O ports. But the CPU does not have any direct I/O port, instead they are available on both RAM and ROM! You can also read and write “RAM status characters”, which to me look like plain regular RAM cells within another namespace. If someone knows, I’d love to hear what they were used for (and if they maybe did behave differently to normal RAM).

Take a look at the data sheet. Within its only 9 pages, the instruction set is depicted on page 4 and 5. Especially in the light that fairly reasonable orthogonal instruction sets appear to have been available in multi-chip CPUs, this first single-chip CPU is clearly a strange specialization towards the desk calculator it was meant for (the Busicom 141-PF). It has the aforementioned index register-centered RAM access, separate ROM (although there is a transfer instruction which strangely refers to some optional “read/write program memory”), a three level internal stack which is almost useless for general purpose programming and a lone special purpose instruction for “keyboard process” (KBP).

Original 4004 CPUs go from anything from a few to a few thousand dollars on eBay, depending on their packaging and revision. If you’d like to, you can instead play around with a virtual one in this java-script based, fully fledged assembler, disassembler and emulator, or read the rescued source code of the Busicom 141-PF calculator. There’s lots more of schematics, data sheets and other resources on the Intel’s anniversary project page.

That is, if you are brave enough.

pixelstats trackingpixel

18 Responses to “Assembly Evolution Part 1: Accessing Memory and the strange case of the Intel 4004”

  1. Johan Thelin says:

    “Does the CPU copy the register pair?s content to the address bus pins only when actual memory operations take place, or are the address bus pins somehow directly connected to H and L itself? Is that even feasible? I?d really like to find out?”

    Well, I guess they are not – at the very least they are muxed so that the stack pointer and instruction pointer can be used to address memory as well.

  2. Julien Oster says:

    You’re right, it needs to be multiplexed with the instruction pointer at least. But on the 8008, which was the one that prompted me to think about this, there is no external stack and I wonder what happens outside the instruction fetch cycles.

    The 8080 and 8085 actually have some more addressing modes, so the M pseudo-register seems a bit more like a compatibility thing, and I can imagine that there are some more layers inbetween.

    Unfortunately, I have never designed a CPU (except emulated in software), so it’s just guesswork…

  3. Damien Guard says:

    At least the Z80 clears up the 8080 mnemonics to show what it really is:

    ld hl, 0005h
    ld (hl), 3

    [)amien

  4. Hi folks,

    The article points to the 4004 datasheet where the bus protocol is provided. Although instruction execution sends out a complete 12-bit address, for data access, essentially each RAM and ROM chip has an internal address register as a way of ‘optimizing’ bandwidth on the shared 4-bit address/data bus.

    I wrote a small blog about it recently when I was trying to figure it out (can’t vouch for its complete accuracy).

    http://oneweekwonder.blogspot.com/2011/09/accessing-ram-on-intel-4004.html

    -cheers from Julz.

  5. [...] pagetable] Filed under: random — by johngineer, posted November 7, 2011 at 9:58 am Comments [...]

  6. Jason Stead says:

    Thanks for the article Julien,

    Just on the 8085 CPU, the MOV M,3 won’t work. It needs to be MVI M,3 to move immediate.

    Also, when I was wondering how the 4004 CPU worked I wrote a simulator on AmigaOS for my own interests which I later released to AmiNET. It requires AmigaOS 3.5+ so that severely limits the audience, but then as I said, I did write it as an experiment on the OS I was using at the time. If anyone wants to look it is at http://aminet.net/search?query=SIM4004. The javascript one listed above is probably more useful, though.

  7. why? says:

    CPU’s well suited for such comparisons are also the RCA 1802 and Fairlight F8.

  8. Peter Gordon says:

    Dear Santa,

    For christmas, I would like some new posts on pagetable.com

    Thanks,
    Pete

  9. Peter Gordon says:

    Oh, and another fun and quirky early processor to look at is the Signetics 2650.

  10. MiaM says:

    Re: why?

    I assume you mean Fairchild F8 and not Fairlight F8?

  11. John Doe says:

    Myria Myria where are you?

  12. why? says:

    yes, Fairchild not Fairlight of course !

  13. [...] deepest possible level, without turning to physics. The site covers topics such as quirks of the first ever CPU, theĀ Intel 4004, copying disks on the C64 – quickly, using branch delay slots, and much, [...]

  14. Code2Free says:

    hi

    i m Haider Ali 19 years old from Iraq

    University student in the first phase, in the engineering of computer technology

    is there difference between (engineering of computer technology)

    i have question if you please

    1- i wont to become reverse engineer what programming language should i learn

    from where could i start (with any CPU) i don’t now any thing
    about CPU architecture and CPU reverse engineering could you tell me please
    i Intel 8085/8085A is enough for beginner

    can you give me a lot of tips about that please

  15. magasin louis vuitton…

    Le probl?me est que non seulement ils ?taient incapables de passer , ils se fait tir? dans le pied en essayant . Quel sera le Oiler $ faire quand un de leurs agents libres restreints les prend ? l’arbitrage et obtient attribu? un ?norme jour de paie a…

  16. nike free run 2.0…

    these tags are linked to any facet or to any corner near to toward wholesale low-cost burberry purses, It consists of instant over time details On this earth, Occasionally within path of zipper also. as instantly since the vendor exhibits his/her incap…

  17. chaussure nike shox pas cher…

    Puns are not continually humorous, and in some cases not even intended to become humorous. Much like preferred fads that nobody will confess to supporting, puns are appeared down on and criticised. Having said that, as consumers like Jonathan Swift the…

  18. Janessa says:

    Does the article provide the reader with useful and unique information.
    A blog would be the hub of your business and establish your online presence.

Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word