lsb = least significant bit | msb = most significant bit | |
Bit position within a byte are 7 6 5 4 3 2 1 0 | ||
Bit 0 is the lsb. | Bit 7 is the msb. | |
A = the accumulator. | X = register X. | Y = register Y. |
P = process status register. | PC = program counter | PCL = low byte of PC. |
PCH = high byte of PC. | SP = stack pointer. | ALU = arithmetic and logic unit. |
AR = address register of AR. | ARL = low byte AR. | ARH = high byte of AR. |
Process status flags: | ||
N = negative (bit 7). | V = overflow (bit 6). | B = break (bit 4). |
D = BCD (bit 3). | I = interrupt (bit 2), | |
Z = zero (bit 1). | C = carry (bit 0). |
The 6502 and the Z80 8-bit microprocessors have retained their popularity with personal computer manufacturers for many years. Their popularity is likely to remain until the approaching 16-bit revolution is established. Both the 6502 and the Z80 have good and bad features which are fairly equally distributed. The Z80 has sometimes been praised as the more powerful of the two but, in the absence of a satisfactory definition of 'power' this praise has little substance. If by 'power' we mean execution speed then neither type is superior. Some types of program can execute faster on the Z80; others execute faster on the 6502. Because of this it is not wise to pay too much attention to 'benchmark' tests. Comparison tests for computers have about the same reliability index as intelligence tests for humans: they tend to test the tester more than the testee. The Z80 has a powerful marketing advantage because of its downward compatibility with the Intel 8080. The widely used disk operating system CP/M, for which an enormous amount of commercial software has been written, is based on the 8080 instruction set, so any microcomputer which runs on the 6502 could be said to be disadvantaged in this respect. The justification for discussing the Z80 at all in this book (which is supposed oriented towards the 6502!) is because one of the Second Processor options runs on the Z80. There will be a choice of another 6502, a Z80 or the new 16-bit 16032. Because of this, it is considered reasonable to include occasional references to all three microprocessors. The Motorola 6800 was the ancestor of the 6502. Apart from the indirect addressing modes which are unique to the 6502, they are, in many respects, similar.
Fig. 2.1. Position of the 6502 in relatwn to the external bus dewces.
It is possible to delve straight into machine code programming without troubling very much about the technical details of the 6502. Indeed, the introductory book Discovering BBC Micro Machine Code plunged straight into a program on page 8. This book is the sequel, intended to fill up some of the gaps left in both the software and hardware treatment It pays dividends in the long run if the internal behaviour of a microprocessor is understood. It can also be interesting for its own sake.
It is better to begin by reviewing the microprocessor in relation to other main components of the system, as shown in Fig. 2.1. The operating system and BASIC language ROMs each have a capacity of 16K bytes (type number 13 128). They are connected across the address and data buses. Note that the address bus is shown split down the middle because it is important always to bear in mind that a 4-hex digit address code is handled by the microprocessor in two halves, the lines A0to A7 (low-byte) and lines A8 to A13 (high-byte).
The RAM complement is not so straightforward because of the reduced packing density within the chip. Each RAM chip in the BBC machine, (type number 4816) stores only 16K bits (not bytes) so it is necessary to use sixteen of these chips to form a storage system of 32K bytes. (The Electron uses four of the new 64K bit RAMs to produce the total 32K bytes.) Another factor contributing to complexity is the 'dynamic' nature of the memory. The correct title for this class of reach write memory is DRAM, the 'D' prefix standing for dynamic. Due to the need for reducing current consumption and maximising packing density, each bit is stored within the inter-electrode capacity of MOS transistors (see Appendix A). The stored information, however, is a transient affair, leaking away in a few milliseconds. Consequently, each stored bit must be periodically recharged in order to compensate for the leakage. This process, called 'refreshing' is inherent in the hardware design and is not the responsibility of the programmer. However, the refresh-cycle takes up extra time. DRAMs are therefore a compromise in which access time is sacrificed in order to increase packing density and reduce cost. It is worth mentioning that the BBC and Electron systems are not alone in employing DRAMs. Nearly all microcomputers have, and still do, use them. The alternative would be to use static RAMs but the cost would be prohibitive and they would occupy a greater space on circuit boards. Having noted this, DRAMs will still be referred to as RAMs: distinction is academic. Note that, unlike the ROM chips, the feed to the data bus is bidirectional.
6502 systems are memory-mapped so it is not surprising that keyboard, screen display and the input: output interfaces are strung across the address and data buses as if they were memory chips. In the case of the screen display, the dotted line on the figure indicates the additional data path between the area of RAM dedicated to the screen display circuits. To avoid cluttering the diagram, the various signal fines forming the 'control bus' are not shown.
Figure 2.2 shows reasonable, but by no means complete, details of the paths between the various registers. Such paths within the microprocessor are often called highways because they ramify over the chip area to provide a kind of long distance communication.
Fig 2.2. 6502 registers and highways
The chaos is only apparent. Control lines (not shown) operate the input and output gates of each separate register, ensuring that only one pair is allowed access to the highway at any one time. For example, during the machine code instruction TAX, only register A output gate and register X input gate are open to the data highway, allowing the contents of A to be transferred to X.
The majority of instructions we give to microprocessors are data transfers, either between internal registers or between registers and the external RAM, ROM or peripherals. Some instructions, such as ADC (add with carry) perform arithmetical operations on the data but the data still has to be fetched from somewhere else. Even a simple instruction like INX (increment contents of X) involves a transfer because the X register is not equipped for altering itself. Instead, the contents of X must be transferred along the highway to the arithmetic section and subsequently returned.
It is assumed that many readers will already be aware of the various registers and their functions but, for the sake of continuity a brief description follows together with the standard abbreviations subsequently used in all references. A distinction is made between directly programmable and the other registers which, although playing a vital role, remain in the background, unseen by the programmer.
This register has a supreme role. It is the only one capable of performing arithmetic processing. This is evident from Fig. 2.2 which shows that, in addition to the usual connection to the highway, there is a direct and exclusive link to the Arithmetic and Logic Unit (ALU). It is involved in transfers to and from memory and acts as interim data storage during arithmetic and logic operations. For example, during a simple addition of two numbers (ADC), the first number must pass to the accumulator and is then 'entered' to a holding register within the ALL]. The second number then enters A, the addition is carried out and the result sent back to A. Those used to scientific calculators in the Hewlett-Packard range will recognise the inherent Reverse Polish (RP) action.
It is worth digressing a little to explain RP. A Polish mathematician proposed a new method of expressing arithmetic, the essence of which was placing the operator (+, - ,× etc.) after, instead of in between, variables. For example, instead of writing A+ B to indicate addition, he proposed that it should be written AB+. Because his name was quite unpronounceable (and almost unspellable in English) his system has become known simply as Reverse Polish Notation (RPN or simply RP). The influence of von Neumann on the evolution of the computer was mentioned in Chapter 1. He suggested that the arithmetic system of digital computers would operate most efficiently if based on RPN. Thus the ALU of the BBC machine, in common with nearly all other computers, requires the two variables first; the add operator is then activated and the result passed to A, replacing the previous contents.
The dominance of the accumulator over other registers is evident from the instruction set of the 6502. However, the fact that only one accumulator is present gives ammunition for the protagonists of the rival Z80 which boasts eight accumulator-type registers. A single accumulator does tend to be restrictive in organising efficient machine code.
Like the accumulator, the X register and the Y register (subsequently referred to as X and Y) are both 8 bits wide. They have three primary uses in programming.
The resultant is interpreted as the address of the required data. This idea was pioneered by a team at Manchester University and, at the time, represented a huge step forward in computer science. They called the index register the 'B box', presumably to differentiate it from the accumulator A. Previous to this, altering the operand address in loops was cumbersome. It involved loading the operand from inside the program, incrementing it and then storing it back in the original position. In other words, it was necessary to alter the program in order to modify an address. Indexed addressing is so much cleaner to work with and certainly less error-prone. Most of the indexable instructions in the 6502 allow a choice of using either X or Y for indexing. Although indexed addressing is later dealt with in detail, there is no harm in a little anticipation for the benefit of those who are new to the idea. So, consider an example in which register X contains 30, and we write LDA 100,X.
The simple instruction LDA 130, however. would have the equivalent effect. They would both load the contents of address 130 into A. The advantage of the indexed over the simpler form will be apparent when organising loops invoking action on consecutive addresses.
This discussion should help to explain why the address bus, as well as the data bus, has access to the ALU. This should be understandable now it is recognised that the index register contents have to be added to the operand. After all, address modification by indexing produces a computed address and only the ALU can truly compute.
If we define a register as an internal mermory location for holding or processing data, then the process status register (P) is not a register at all. It is, in fact, a collection of isolated flip-flops (see Appendix A), each capable of storing one bit. Each bit is called a 'flag' because it conveys certain information in yes no form either for the benefit of the machine or the programmer. After most instructions, the relevant flags are updated, depending on the result. There is no connection, either in the hardware or software aspects, between different flags. in spite of this, it is convenient and conventional to refer to it as a register. It is important to the programmer to understand the exact significance of each flag, under what conditions they are set or reset, which are under the control of the microprocessor and which are directly programmable.
The N bit: If this is 1, the last result contained a 1 in bit 7 position. The N bit is often misleadingly called the 'sign bit' because two's complement arithmetic recognises bit 7 as the sign rather than magnitude. If the number is unsigned binary, the N flag merely indicates the state of bit 7. It is automatically set or reset and is not directly programmable. BMI (branch if minus) and BPL (branch if plus) are the relevant branch instructions conditional on the state of the N bit Most instructions leave the N bit updated as part of the execution routine The notable exceptions being STA, STX, STY, TXA, and ail branch and jump instructions (see Appendix C for complete coverage). LSR is unique in that the N bit is always reset to 0, irrespective of the result.
The V bit: If this bit is 1, it indicates that the last arithmetic instruction caused two's complement overflow due to the result being outside the capacity of a single byte. It can be tested by the conditional branch instructions, BVS or BVC. It is of no significance to the programmer when using unsigned binary because bit 7 of the result represents magnitude rather than sign. In this case, it can be ignored. However, the V bit also plays a major role in the BIT test instruction, assuming the same state as bit 6 of the data being tested.
It is possible to directly clear the V bit to 0 by the instruction CLV although there is no corresponding instruction to directly set it to 1. Only the instructions ADC, SBC, BIT, PLP, RTI and CLV affect the V bit.
The B bit: This is set to 1 when a BRK instruction is encountered. Its significance is limited almost entirely to interrupt sequences. It cannot be directly programmed.
The D bit: The 6502 can perform arithmetic on straightforward binary numbers or on BCD (Binary Coded Decimal) numbers. The programmer decides this by the use of either SED (set decimal) which makes D=1 or CLD (clear decimal) which makes D=0. The arithmetic mode currently in use remains until the D bit is altered. The default mode is D=0. The instructions which affect the D bit are CLD, SED, PLP, and RTI.
The I bit: This is called the interrupt mask bit or the interrupt inhibit. It is inspected by the microprocessor when an interrupt request is received from a peripheral source. If it is 1, the request is not granted. It can be directly set to 1 by SEI (set interrupt) or cleared to 0 by CLI (clear interrupt). These instructions are vital when designing the software for peripheral interfaces, most of which will be interrupt-driven. The instructions which affect the I bit are BRK, CLI, SEI, PLP and RTI.
The Z bit: This is the zero bit, and is set to 1 when a result is 0. This is worth emphasising strongly because it is often interpreted back to front. If a result is non-zero, the Z bit goes to 0. It can be tested by the branch instructions, BEQ (branch if equal to zero) or BNE (branch if not equal to zero). There are no instructions which can directly effect it. Most instructions affect the Z bit. The exceptions include TXS, STA, STX, STY and the branch and jump instructions.
The C bit: This is the carrv bit, and is set to I when a carry out from the msb is detected. Instead of the bit 'dropping on to the floor' it is popped into the C bit. It can also be thought of as the ninth bit, particularly in shift and rotate instructions. It can be tested by the branch instructions BCS (branch if carry set) or BCC (branch if carry clear). It can also be directly programmed by SEC which sets C to 1 or CLC which clears C to 0. Instructions which affect the C bit are ADC. SBC, ASL, LSR, ROL, ROR, SEC, CLC, PLP, RTI, CMP, CPX and CPY.
It is clear from the above that the process status register flags have a profound effect on program behaviour. The majority of errors encountered, particularly when setting the terminating conditions for loop exit, are due to misinterpreting the behaviour of the flag bits. Unless you are already confident in this area you would do well to reread the above treatment several times.
This is an S'-bit register, dedicated to the automatic control of a special area in page one in RAN! memory designated the 'stack'. Its function is as an address generator. It is impossible to describe the stack pointer fully without describing the stack itself. Because the stack is so important in its own right, discussion of its anatomy will be postponed. It is sufficient at this point to grasp the following essentials:
Fig. 2.3. The stack and stack pointer
In any microprocessor, some of the most important registers remain transparent (or at least translucent) to the programmer. That is to say, instructions are not provided which make direct reference to them. In fact, the more important a register, the less likely is the programmer allowed direct access. In the 6502, the unseen registers (refer back to Fig. 2.2) are the Program Counter (PC), the two address registers ADL, ADH and the instruction Register (IR).
This enjoys the honour of being the only 16-bit register in the 6502. If there is an established register hierarchy, then PC is the undisputed candidate so its function deserves strong emphasis:
The contents of the Program Counter is always the address of the next instruction byte to be executed. |
The 16-bit length allows reference to any address in the entire 64K range.
Once a stored program is commanded to 'start execution', the following automatic sequence begins:
The sequence continues indefinitely, sweeping through the program bytes like a scythe until halted legitimately or an illegal code is reached. The sequence makes no distinction whatsoever between program and data. It is up to the programmer to arrange the instruction bytes in consecutive address order and organise either a break (BRK) or an orderly return to the operating system loop. If the PC is allowed to reach data bytes it will interpret these as instructions which the 6502 will either attempt to execute or crash in despair.
It is all very well describing the sequence but how does PC know where the program starts? When entering a program under the direction of the assembler, there is no problem. It is simply a case of knowing the starting address of a program and assigning this to the reserved variable P%5. In other words, Pg$ appears to the user of the assembler as PC. But this convenience is by courtesy of the software built into the operating system ROM. As a matter of fact, the actual mechanism of loading the PC gives rise to a disturbing question which strikes at the root of stored program sequence control. This is the question: How is it possible to load PC with the starting address of the program unless there is already a program capable of performing the load action?
This is a chicken and egg situation because we can't fall back on the 'operating system'. The operating system is also a program so how was this loaded originally? There have been, and still are, various solutions to the problem although we are only concerned here with the method inherent in the 6502 microprocessor. When the reset line (RES) is momentarily grounded (usually arranged to coincide with the closing of the power-on switch) the following series of events take place:
From the above, it is evident that the writers of the operating system must ensure that the correct starting address is in &FFFC and &FFFD. It is equally evident that they must be in ROM (RAM can only be loaded with data by a program which already exists). Note that the concept of a vectored address allows the system programmer complete freedom to position the program anywhere. It would have been easier, of course, for microprocessor designers to lay down a mandatory starting address, say, 'all programs must start at address &0000'. This would allow PC to be initialised by a simple zero reset. However, the vectored address approach is flexible and we should remember that infinite flexibility has always been the goal of computer scientists. There are three vectored addresses in the 6502 and, for completeness, these are shown in the following table:
6502 vectored addresses | |
Vector address | Function |
&FEEA and &FFEE | Non-maskable interrupt |
&FFFC and &FEED | Start-up/reset |
&FFEE and &FFFF | Interrupt request |
Although PC ensures that instructions are normally accessed and executed in consecutive address order, there are times when the sequence must be broken. When a jump or conditional branch instruction is encountered, the current contents of PC are altered drastically. In the case of an absolute jump, the entire contents of PC are replaced by the instruction operand. Branch instructions, however, use relative addressing rather than absolute. The operand is in the nature of an offset which is added to, rather than replacing, the existing contents of PC. Since the offset is in two's complement binary (allowing positive or negative numbers) it is still possible to branch forward or backward.
The first byte of all machine code instructions is the operation code (abbreviated to 'op-code'). The code, which is different for every instruction and addressing mode, carries two vital pieces of information:
On receipt of the code from memory (known as the FETCH phase) it is routed via the highways to 1K where it is held pending execution. If the decoding reveals that the instruction requires no further operand bytes (such as TXA, TAX etc.), the instruction sequence enters the EXECUTE phase. If, on the other hand, decoding reveals that one or more operand bytes must follow, the sequence remains in the FETCH phase until the complete instruction has been received from memory.
The data bus carries information downwards from the microprocessor when writing to memory and upwards to the microprocessor when reading from memory. Because of this, DR operates as a bidirectional holding register, controlled by the R/W fine. You will remember, from earlier discussions, that when R/W is in the high state (logic 1) the DR would be switched to the READ direction, and to the WRITE direction when in the low state (logic 0). The power levels on the raw bus are weak and external buffers may be needed to boost the power. A full 64 K of memory with additional peripheral loads could lead to degradation of logic levels (see Appendix A).
Whilst on the subject of the data bus, it is convenient to discuss the effect of data-jamming. It is essential that all memory and peripheral devices connected directly to the data bus arc equipped with 'tristate' outputs. That is to say, when the devices are in the disabled state, their connections to the data bus should be electrically impotent. Tristate devices ensure this by effectiveiy open-circuiting the outputs during the disabled state.
A 4-hex digit address describes a 16-bit logic pattern on the address wires A0 to A 15. The address information can originate from several possible sources. It could originate from A, the output of the ALU or even the data bus. From whatever source, it will eventually be routed along the highway, ending up in the address register. This register is split into two halves, each contributing a byte to the two-byte address. The lower order byte (A0 to A7) is held in ARL and the high order byte (A8 to A15) in ARH. As discussed earlier, the high byte determines the page address and the low byte the address on the page. The individual lines on the address bus are direct outputs of the registers. They are, of course, always outputs so the R/W control line is not involved. It should also be noted that, unlike the data bus, devices connected to the address bus need not be tristate. This is because the address bus is always an output from the microprocessor intended to feed only the address decode circuits of memory or peripheral den ices. Only the address registers can supply the bus so there is no possibility of data jamming by alternative logic voltage sources.
The term 'microprogram' has nothing to do with programs written for a microcomputer. In fact, microprograms are those which are buried inside the silicon of the microprocessor chip itself! It may surprise some readers that every instruction in the repertoire (about 200 in the 6502) requires its own special micro program. A simple machine code instruction like LDA &72 is simple only from the viewpoint of the human intellect. In contrast, logic circuits (which are baffled if required to answer any question with other than yes or no) require considerable assistance in dealing with LDA &72. They need micro-instructions, fed one at a time in order to open and close the appropriate register gates and activate the control fines. These micro-instructions must be given in the correct sequential order for every individual instruction. Since a sequential set of instructions is, by definition, a program, then it becomes evident that the earlier statement is justified: Every instruction needs its own microprogram.
We do not propose to examine in detail each of these microprograms. This would take more space than this book allows. However, it is interesting to examine a possible microprogram for the simple instruction mentioned previously: LDA &72 will LoaD A with the data stored at address &72 on page 0 hex. This instruction consists of two bytes, which we will assume are residing at addresses &2E34, &2E35. The microprogram will first have to fetch these two bytes from memory.
The FETCH phase:
PC, having just dealt with the last byte of the preceding instruction, will already have been incremented to &2E34. A typical sequence would be:
The complete instruction is now lodged in the microprocessor registers, ending the FETCH phase. The EXECUTE phase now begins.
The EXECUTE phase:
(7) The operand (&72) in DR is passed, via the highway, to ADL. ADH is cleared to zero (because it is a page 0 address).
(8) The memory is read, and the data at address &72 is passed to DR.
(9) The contents of DR are passed to A.
The instruction has now been executed with the PC left pointing to the address of the first byte of the next instruction.
The instruction chosen in the example was particularly simple and yet the micro program was quite involved. It is left to the imagination to visualise the microprogram for ADC (&72),X (post-indexed indirect addressing).
Microprogrammers are a specialist breed and usually employed on the design staff of the chip manufacturer. It is fortunate that the brief outline above on microprogramming was included as a topic of interest only. The normal machine code programmer takes each complete instruction for granted and is oblivious to the existence of the internal microprogram steps. If we call machine code a low-level language, then microprogramming is at ground-level!
Figure 2.2 shows the decode matrix. Its function is to accept the op-code held in IR, decode it, and finally output a pattern of bits on the various gate and timing controls. This pattern will be different for every step in the microprogram. If this function is analysed carefully, we may come to the conclusion that the decode matrix will behave like a miniature computer with a number of fixed programs inside. We can relate IR to the 'program counter'. The op-code is only the starting address of the relevant microprogram. The 'words' read out from the ROM are the bit patterns supplying the various register gates and controls. These patterns wfll vary for each step of the microprogram. The gate controls are all hard wired to the various registers. This wiring is omitted from Fig. 2.2 to prevent an already complex diagram from becoming incomprehensible.
It is not always appreciated that the clock pulses, which in the 6502 arc running at 2 MHz (0.5ms period), are split up within the decode matrix. Several sub-pulses are formed, each sub-pulse initiating each step of the microprograms. Within the matrix, the clock pulses are merely the 'low-frequency' envelope of the sub-pulses
Addition, subtraction and logical instructions will obviously be the responsibility of the ALU. However, in the interests of versatility, nearly all data is made to pass through the ALL] irrespective of the particular instruction. For example, data can pass through the ALU without change by adding zero. This may seem time-wasting but is justifiable from a wider viewpoint of a wider system. For example, address modification by indexing involves adding the contents of X or Y to the operand so the ALU is directly involved.
The 6502 is incapable of multiplication, division or exponential operations. It is not alone in this respect. It is very rare to find 8-bit microprocessors capable of performing any arithmetic instructions other than addition. Even subtraction is achieved by the roundabout way of adding the complement.
Before criticising these limitations. it should be remembered that the microprocessor was designed with the primary objective of controlling electrically operated devices and a primitive instruction repertoire was quite sufficient for the purpose. It was never intended to be the brain of a general purpose computer. However. if a machine can add, it is fairly easy to write subroutines which can multiply, divide and handle exponentials. Users of BASIC, or indeed most other high level languages, are unaware of the primitive capabilities of the microprocessor although they have to pay for it by reduced execution speed. Software solutions are always much slower than hardware implementation.
The new breed of 16-bit microprocessors are virtually second generation products many of which are including instructions which perform direct multiplication and division at an impressive speed.
The design of an ALU is based on a parallel binary adder which can be considered as the prototype. With this as a basic building block, it is a relatively simple exercise in logic to arrange gates for implementing exclusive-or (EOR), logical anding (AND) and the inclusive-or (ORA) functions. Finally, it would only require 'function select' inputs to complete the transformation. Four of these, driven by the output word from the control matrix, could activate any one of 16 functions.
The details of all machine code instructions are given in the relevant chapter but it us convenient at this stage to introduce the BRK instruction. Interrupts are normally the prerogative of peripheral devices but BRK us software initiated. Superficially, it just stops the computer but a dig beneath the surface reveals some interesting side-effects. The instruction takes 7 clock cycles to complete the following steps:
The motive behind this seeming complexity is to aid the writing of software error traps during program development. It is commmon practice to put BRK at strategic 'bug-hazard' points. This would be useless if the sole function of break was to kill all program flow completely. However, it will be seen from the above that a convenient loop-hole is prepared. Providing a routine is written, with the start address residing in the Break Vector at &FFEE/&FFFF, control is automatically diverted to the routine rather than stopping dead. The routine must establish, by pulling P back from the stack, that the B bit was set as a result of a true BRK rather than a genuine peripheral interrupt.